skip to main content
10.1145/1281192.1281293acmconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
Article

Relational data pre-processing techniques for improved securities fraud detection

Authors Info & Claims
Published:12 August 2007Publication History

ABSTRACT

Commercial datasets are often large, relational, and dynamic. They contain many records of people, places, things, events and their interactions over time. Such datasets are rarely structured appropriately for knowledge discovery, and they often contain variables whose meanings change across different subsets of the data. We describe how these challenges were addressed in a collaborative analysis project undertaken by the University of Massachusetts Amherst and the National Association of Securities Dealers(NASD). We describe several methods for data pre-processing that we applied to transform a large, dynamic, and relational dataset describing nearly the entirety of the U.S. securities industry, and we show how these methods made the dataset suitable for learning statistical relational models. To better utilize social structure, we first applied known consolidation and link formation techniques to associate individuals with branch office locations. In addition, we developed an innovative technique to infer professional associations by exploiting dynamic employment histories. Finally, we applied normalization techniques to create a suitable class label that adjusts for spatial, temporal, and other heterogeneity within the data. We show how these pre-processing techniques combine to provide the necessary foundation for learning high-performing statistical models of fraudulent activity.

References

  1. Blau, H., Immerman, N., and Jensen, D. A Visual Language for Querying and Updating Graphs. Technical Report 2002-37, University of Massachusetts, 2002.Google ScholarGoogle Scholar
  2. Cortes, C., Pregibon, D., and Volinsky, C. Communities of interest. Lecture Notes in Computer Science, 2189, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Friedland, L., and Jensen, D. Finding tribes: Identifying close-knit individuals from employment patterns. In Proceedings of the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Goldberg, H. G. and Senator, T. E. Restructuring databases for knowledge discovery by consolidation and link formation. In Proceedings of the First International Conference on Knowledge Discovery and Data Mining, 1995.Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Jensen, D. and Neville, J. Autocorrelation and linkage cause bias in evaluation of relational learners. In 12th International Conference on Inductive Logic Programming, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Levenshtein, V. Binary Codes Capable of Correcting Deletions, Insertions and Reversals. Soviet Physics Doklady 10, 1966.Google ScholarGoogle Scholar
  7. Neville, J., Jensen, D., Friedland, L., and Hay, M. Learning relational probability trees. In Proceedings of the 9th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Neville, J., Simsek, O., Jensen, D., Komoroske, J., Palmer, K., and Goldberg, H. Using relational knowledge discovery to prevent securities fraud. In Proceedings of the 11th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Provost, F. and Domingos, P. Tree induction of probability-based ranking. Machine Learning Journal, 52(3), 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Provost, F. and Fawcett, T. Analysis and visualization of classifier performance comparison under imprecise class and cost distributions. In Proceedings of the 3rd International Conference on Knowledge Discovery and Data Mining, 1997.Google ScholarGoogle Scholar

Index Terms

  1. Relational data pre-processing techniques for improved securities fraud detection

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Conferences
        KDD '07: Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining
        August 2007
        1080 pages
        ISBN:9781595936097
        DOI:10.1145/1281192

        Copyright © 2007 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 12 August 2007

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • Article

        Acceptance Rates

        KDD '07 Paper Acceptance Rate111of573submissions,19%Overall Acceptance Rate1,133of8,635submissions,13%

        Upcoming Conference

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader