Abstract
Authorship attribution, the science of inferring characteristics of the author from the characteristics of documents written by that author, is a problem with a long history and a wide range of application. Recent work in "non-traditional" authorship attribution demonstrates the practicality of automatically analyzing documents based on authorial style, but the state of the art is confusing. Analyses are difficult to apply, little is known about type or rate of errors, and few "best practices" are available. In part because of this confusion, the field has perhaps had less uptake and general acceptance than is its due.
This review surveys the history and present state of the discipline, presenting some comparative results when available. It shows, first, that the discipline is quite successful, even in difficult cases involving small documents in unfamiliar and less studied languages; it further analyzes the types of analysis and features used and tries to determine characteristics of well-performing systems, finally formulating these in a set of recommendations for best practices.
References
- {1} A. Abbasi and H. Chen, "Applying authorship analysis to extremist-group web forum messages," IEEE Intelligent Systems, vol. 20, no. 6, pp. 67-75, 2005. Google Scholar
Digital Library
- {2} A. Abbasi and H. Chen, Visualizing Authorship for Identification, pp. 60-71. Springer, 2006. Google Scholar
- {3} American Board of Forensic Document Examiners, "Frequently asked questions," http://www.abfde.org/FAQs.html, accessed January 6, 2007.Google Scholar
- {4} Anonymous, "Some anachronisms in the January 4, 1822 Beale letter," http://www.myoutbox.net/bch2.htm, accessed May 31, 2007, 1984.Google Scholar
- {5} S. Argamon and S. Levitan, "Measuring the usefulness of function words for authorship attribution," in Proceedings of ACH/ALLC 2005, Association for Computing and the Humanities, Victoria, BC, 2005.Google Scholar
- {6} S. Argamon, S. Dhawle, M. Koppel, and J. W. Pennebaker, "Lexical predictors of personality type," in Proceedings of the Classification Society of North America Annual Meeting, 2005.Google Scholar
- {7} A. Argamon-Engleson, M. Koppel, and G. Avneri, "Style-based text categorization: What newspaper am I reading," in Proceedings of the AAAI Workshop of Learning for Text Categorization, pp. 1-4, 1998.Google Scholar
- {8} R. H. Baayen, H. van Halteren, A. Neijt, and F. Tweedie, "An experiment in authorship attribution," in Proceedings of JADT 2002, pp. 29-37, Université de Rennes, St. Malo, 2002.Google Scholar
- {9} R. H. Baayen, H. Van Halteren, and F. Tweedie, "Outside the cave of shadows: Using syntactic annotation to enhance authorship attribution," Literary and Linguistic Computing, vol. 11, pp. 121-131, 1996.Google Scholar
Cross Ref
- {10} R. E. Bee, "Some methods in the study of the Masoretic text of the Old Testament," Journal of the Royal Statistical Society, vol. 134, no. 4, pp. 611-622, 1971.Google Scholar
Cross Ref
- {11} R. E. Bee, "A statistical study of the Pinai Pericope," Journal of the Royal Statistical Society, vol. 135, no. 3, pp. 391-402, 1972.Google Scholar
- {12} D. Benedetto, E. Caglioti, and V. Loreto, "Language trees and zipping," Physical Review Letters, vol. 88, no. 4, p. 048072, 2002.Google Scholar
Cross Ref
- {13} D. Biber, S. Conrad, and R. Reppen, Corpus Linguistics: Investigating Language Structure and Use. Cambridge: Cambridge University Press, 1998.Google Scholar
- {14} J. N. G. Binongo, "Who wrote the 15th book of Oz? An application of multivariate analysis to authorship attribution," Chance, vol. 16, no. 2, pp. 9-17, 2003.Google Scholar
Cross Ref
- {15} A. F. Bissell, "Weighted cumulative sums for text analysis using word counts," Journal of the Royal Statistical Society A, vol. 158, pp. 525-545, 1995.Google Scholar
Cross Ref
- {16} E. Brill, "A corpus-based approach to language learning," PhD thesis, University of Pennsylvania, 1993. Google Scholar
- {17} C. Brown, M. A. Covington, J. Semple, and J. Brown, "Reduced idea density in speech as an indicator of schizophrenia and ketamine intoxication," in International Congress on Schizophrenia Research, Savannah, GA, 2005.Google Scholar
- {18} P. F. Brown, J. Cocke, S. A. Della Pietra, V. J. Della Pietra, F. Jelinek, J. D. Lafferty, R. L. Mercer, and P. S. Roossin, "A statistical approach to machine translation," Computational Linguistics, vol. 16, pp. 79-85, June 1990. Google Scholar
Digital Library
- {19} C. J. C. Burges, "A tutorial on support vector machines for pattern recognition," Data Mining and Knowledge Discovery, vol. 2, no. 2, pp. 955-974, 1998. Google Scholar
Digital Library
- {20} J. F. Burrows, "'An ocean where each kind..': Statistical analysis and some major determinants of literary style," Computers and the Humanities, vol. 23, no. 4-5, pp. 309-21, 1989.Google Scholar
Cross Ref
- {21} J. F. Burrows, "Delta: A measure of stylistic difference and a guide to likely authorship," Literary and Linguistic Computing, vol. 17, pp. 267-287, 2002.Google Scholar
Cross Ref
- {22} J. F. Burrows, "Questions of authorship: Attribution and beyond," Computers and the Humanities, vol. 37, no. 1, pp. 5-32, 2003.Google Scholar
Cross Ref
- {23} F. Can and J. M. Patton, "Change of writing style with time," Computers and the Humanities, vol. 28, no. 4, pp. 61-82, 2004.Google Scholar
Cross Ref
- {24} D. Canter, "An evaluation of 'Cusum' stylistic analysis of confessions," Expert Evidence, vol. 1, no. 2, pp. 93-99, 1992.Google Scholar
- {25} C. E. Chaski, "Empirical evaluations of language-based author identification," Forensic Linguistics, vol. 8, no. 1, pp. 1-65, 2001.Google Scholar
- {26} C. E. Chaski, "Who's at the keyboard: Authorship attribution in digital evidence invesigations," International Journal of Digital Evidence, vol. 4, no. 1, p. n/a, Electronic-only journal: http://www.ijde.org, accessed May 31, 2007, 2005.Google Scholar
- {27} C. E. Chaski, "The keyboard dilemma and forensic authorship attribution," Advances in Digital Forensics III, 2007.Google Scholar
- {28} D. Coniam, "Concordancing oneself: Constructing individual textual profiles," International Journal of Corpus Linguistics, vol. 9, no. 2, pp. 271-298, 2004.Google Scholar
Cross Ref
- {29} M. Corney, O. de Vel, A. Anderson, and G. Mohay, "Gender-preferential text mining of e-mail discourse," in Proceedings of Computer Security Applications Conference, pp. 282-289, 2002. Google Scholar
- {30} H. Craig "Authorial attribution and computational stylistics: If you can tell authors apart, have you learned anything about them?" Literary and Linguistic Computing, vol. 14, no. 1, pp. 103-113, 1999.Google Scholar
Cross Ref
- {31} D. Cutting, J. Kupiec, J. Pedersen, and P. Sibun, "A practical part-of-speech tagger," in Proceedings of the Third Conference on Applied Natural Lanugage Processing, Association for Computational Linguistics, Trento, Italy, April 1992. Also available as Xerox PARC technical report SSL-92-01. Google Scholar
- {32} A. de Morgan, "Letter to Rev. Heald 18/08/1851," in Memoirs of Augustus de Morgan by his wife Sophia Elizabeth de Morgan with Selections from his Letters, (S. Elizabeth and D. Morgan, eds.), London: Longman's Green and Co., 1851/1882.Google Scholar
- {33} G. Easson, "The linguistic implications of shibboleths," in Annual Meeting of the Canadian Linguistics Association, Toronto, Canada, 2002.Google Scholar
- {34} A. Ellegard, A Statistical Method for Determining Authorship: The Junius Leters 1769-1772. Gothenburg, Sweden: University of Gothenburg Press, 1962.Google Scholar
- {35} W. Elliot and R. J. Valenza, "And then there were none: Winnowing the Shakespeare claimants," Computers and the Humanities, vol. 30, pp. 191-245, 1996.Google Scholar
Cross Ref
- {36} W. Elliot and R. J. Valenza, "The professor doth protest too much, methinks," Computers and the Humanities, vol. 32, pp. 425-490, 1998.Google Scholar
Cross Ref
- {37} W. Elliot and R. J. Valenza, "So many hardballs so few over the plate," Computers and the Humanities, vol. 36, pp. 455-460, 2002.Google Scholar
Cross Ref
- {38} M. Farach, M. Noordewier, S. Savari, L. Shepp, A. Wyner, and J. Ziv, "On the entropy of DNA: Algorithms and measurements based on memory and rapid convergence," in Proceedings of the Sixth Annual ACM-SIAM Symposium on Discrete Algorithms, pp. 48-57, San Francisco, California, January 22-24, 1995. Google Scholar
- {39} J. M. Farringdon, Analyzing for Authorship: A Guide to the Cusum Technique. Cardiff: University of Wales Press, 1996.Google Scholar
- {40} R. S. Forsyth, "Towards a text benchmark suite," in Proceedings of 1997 Joint International Conference of the Association for Computers and the Humanities and the Association for Literary and Linguistic Computing (ACH/ALLC 1997), Kingston, ON, 1997.Google Scholar
- {41} D. Foster, An Elegy by W.S.: A Study in Attribution. Newark: University of Delaware Press, 1989.Google Scholar
- {42} D. Foster, "Attributing a funeral elegy," PMLA, vol. 112, no. 3, pp. 432-434, 1997.Google Scholar
- {43} D. Foster, Author Unknown: Adventures of a Literary Detective. London: Owl Books, 2000.Google Scholar
- {44} D. Foster, Author Unknown: On the Trail of Anonymous. New York: Henry Holt and Company, 2001.Google Scholar
- {45} W. Fucks, "On the mathematical analysis of style," Biometrika, vol. 39, pp. 122-129, 1952.Google Scholar
Cross Ref
- {46} J. Gibbons, Forensic Linguistics: An Introduction to Language in the Justice System. Oxford: Blackwell, 2003.Google Scholar
- {47} N. Graham, G. Hirst, and B. Marthi, "Segmenting documents by stylistic character," Natural Language Engineering, vol. 11, pp. 397-415, 2005. Google Scholar
Digital Library
- {48} T. Grant and K. Baker, "Identifying reliable, valid markers of authorship: A reponse to Chaski," Forensic Linguistics, vol. 8, no. 1, pp. 66-79, 2001.Google Scholar
- {49} T. R. G. Green, "The necessity of syntax markers: Two experiments with artificial languages," Journal of Verbal Learning and Verbal Behavior, vol. 18, pp. 481-96, 1979.Google Scholar
Cross Ref
- {50} J. W. Grieve, "Quantitative authorship attribution: A history and an evaluation of techniques". Master's thesis, Simon Fraser University, 2005. URL: http://hdl.handle.net/1892/2055, accessed May 31, 2007.Google Scholar
- {51} J. Hancock, "Digital deception: When, where and how people lie online," in Oxford Handbook of Internet Psychology, (K. McKenna, T. Postmes, U. Reips, and A. Joinson, eds.), pp. 287-301, Oxford: Oxford University Press, 2007.Google Scholar
- {52} R. A. Hardcastle, "Forensic linguistics: An assessment of the Cusum method for the determination of authorship," Journal of the Forensic Science Society, vol. 33, no. 2, pp. 95-106, 1993.Google Scholar
Cross Ref
- {53} R. A. Hardcastle, "Cusum: A credible method for the determination of authorship?," Science and Justice, vol. 37, no. 2, pp. 129-138, 1997.Google Scholar
Cross Ref
- {54} J. Hertz, A. Krogh, and R. G. Palmer, Introduction to the Theory of Neural Computation. Redwood City, CA: Addison Wesley, 1991. Google Scholar
- {55} M. L. Hilton and D. I. Holmes, "An assessment of cumulative sum charts for authorship attribution," Literary and Linguistic Computing, vol. 8, pp. 73-80, 1993.Google Scholar
Cross Ref
- {56} D. I. Holmes, "Authorship attribution," Computers and the Humanities, vol. 28, no. 2, pp. 87-106, 1994.Google Scholar
Cross Ref
- {57} D. I. Holmes, "The evolution of stylometry in humanities computing," Literary and Linguistic Computing, vol. 13, no. 3, pp. 111-117, 1998.Google Scholar
Cross Ref
- {58} D. I. Holmes and R. S. Forsyth, "The Federalist revisited: New directions in authorship attribution," Literary and Linguistic Computing, vol. 10, no. 2, pp. 111-127, 1995.Google Scholar
Cross Ref
- {59} D. I. Holmes, "Stylometry and the civil war: The case of the Pickett letters," Chance, vol. 16, no. 2, pp. 18-26, 2003.Google Scholar
Cross Ref
- {60} D. I. Holmes and F. J. Tweedie, "Forensic stylometry: A review of the CUSUM controversy," in Revue Informatique et Statistique dans les Science Humaines, pp. 19-47, University of Liege, Liege, Belgium, 1995.Google Scholar
- {61} D. Hoover, "Another perspective on vocabulary richness," Computers and the Humanities, vol. 37, no. 2, pp. 151-178, 2003.Google Scholar
Cross Ref
- {62} D. Hoover, "Stylometry, chronology, and the styles of Henry James," in Proceedings of Digital Humanities 2006, pp. 78-80, Paris, 2006.Google Scholar
- {63} D. L. Hoover, "Delta prime?," Literary and Linguistic Computing, vol. 19, no. 4, pp. 477-495, 2004.Google Scholar
Cross Ref
- {64} D. L. Hoover, "Testing Burrows's Delta," Literary and Linguistic Computing, vol. 19, no. 4, pp. 453-475, 2004.Google Scholar
Cross Ref
- {65} J. Hopcroft and J. Ullman, Introduction to Automata Theory, Languages, and Computation. Reading: Addison-Wesley, 1979. Google Scholar
- {66} S. R. Hota, S. Argamon, M. Koppel, and I. Zigdon, "Performing gender: Automatic stylistic analysis of Shakespeare's characters," in Proceedings of Digital Humanities 2006, pp. 100-104, Paris, 2006.Google Scholar
- {67} IGAS, "IGAS -- Our Company," http://www.igas.com/company.asp, accessed May 31, 2007.Google Scholar
- {68} M. P. Jackson, "Function words in the 'funeral elegy'," The Shakespeare Newsletter, vol. 45, no. 4, p. 74, 1995.Google Scholar
- {69} T. Joachims, Learning to Classify Text Using Support Vector Machines. Kluwer, 2002. Google Scholar
- {70} E. Johnson, Lexical Change and Variation in the Southeastern United States 1930-1990. Tuscaloosa, AL: University of Alabama Press, 1996.Google Scholar
- {71} P. Juola, "What can we do with small corpora? Document categorization via cross-entropy," in Proceedings of an Interdisciplinary Workshop on Similarity and Categorization, Department of Artificial Intelligence, University of Edinburgh, Edinburgh, UK, 1997.Google Scholar
- {72} P. Juola, "Cross-entropy and linguistic typology," in Proceedings of New Methods in Language Processing and Computational Natural Language Learning, (D. M. W. Powers, ed.), Sydney, Australia: ACL, 1998. Google Scholar
- {73} P. Juola, "Measuring linguistic complexity: The morphological tier," Journal of Quantitative Linguistics, vol. 5, no. 3, pp. 206-213, 1998.Google Scholar
Cross Ref
- {74} P. Juola, "The time course of language change," Computers and the Humanities, vol. 37, no. 1, pp. 77-96, 2003.Google Scholar
Cross Ref
- {75} P. Juola, "Ad-hoc authorship attribution competition," in Proceedings of 2004 Joint International Conference of the Association for Literary and Linguistic Computing and the Association for Computers and the Humanities (ALLC/ACH 2004), Göteborg, Sweden, June 2004.Google Scholar
- {76} P. Juola, "On composership attribution," in Proceedings of 2004 Joint International Conference of the Association for Literary and Linguistic Computing and the Association for Computers and the Humanities (ALLC/ACH 2004), Göteborg, Sweden, June 2004.Google Scholar
- {77} P. Juola, "Compression-based analysis of language complexity," Presented at Approaches to Complexity in Language, 2005.Google Scholar
- {78} P. Juola, "Authorship attribution for electronic documents," in Advances in Digital Forensics II, (M. Olivier and S. Shenoi, eds.), pp. 119-130, Boston: Springer, 2006.Google Scholar
- {79} P. Juola, "Becoming Jack London," Journal of Quantitative Linguistics, vol. 14, no. 2, pp. 145-147, 2007.Google Scholar
Cross Ref
- {80} P. Juola and H. Baayen, "A controlled-corpus experiment in authorship attribution by cross-entropy," Literary and Linguistic Computing, vol. 20, pp. 59-67, 2005.Google Scholar
Cross Ref
- {81} P. Juola, J. Sofko, and P. Brennan, "A prototype for authorship attribution studies," Literary and Linguistic Computing, vol. 21, no. 2, pp. 169-178, Advance Access published on April 12, 2006; doi: doi:10.1093/llc/fql019, 2006.Google Scholar
Cross Ref
- {82} G. Kacmarcik and M. Gamon, "Obfuscating document stylometry to preserve author anonymity," in Proceedings of ACL 2006, 2006. Google Scholar
- {83} A. Kenny, The Computation of Style. Oxford: Pergamon Press, 1982.Google Scholar
- {84} V. Keselj and N. Cercone, "CNG method with weighted voting," in Ad-hoc Authorship Attribution Contest, (P. Juola, ed.), ACH/ALLC 2004, 2004.Google Scholar
- {85} V. Keselj, F. Peng, N. Cercone, and C. Thomas, "N-gram-based author profiles for authorship attribution," in Proceedings of the Conference Pacific Association for Computational Linguistics, PACLING03, pp. 255-264, Dalhousie University, Halifax, NS, August 2003.Google Scholar
- {86} K. Keune, M. Ernestus, R. van Hout, and H. Baayen, "Social, geographical, and register variation in Dutch: From written MOGELIJK to spoken MOK," in Proceedings of ACH/ALLC 2005, Victoria, BC, Canada, 2005.Google Scholar
- {87} D. V. Khmelev and F. J. Tweedie, "Using markov chains for identification of writers," Literary and Linguistic Computing, vol. 16, no. 3, pp. 299-307, 2001.Google Scholar
Cross Ref
- {88} M. Koppel, N. Akiva, and I. Dagan, "Feature instability as a criterion for selecting potential style markers," Journal of the American Society for Information Science and Technology, vol. 57, no. 11, pp. 1519-1525, 2006. Google Scholar
Digital Library
- {89} M. Koppel, S. Argamon, and A. R. Shimoni, "Automatically categorizing written texts by author gender," Literary and Linguistic Computing, vol. 17, no. 4, pp. 401-412, doi:10.1093/llc/17.4.401, 2002.Google Scholar
Cross Ref
- {90} M. Koppel and J. Schler, "Exploiting stylistic idiosyncrasies for authorship attribution," in Proceedings of IJCAI'03 Workshop on Computational Approaches to Style Analysis and Synthesis, Acapulco, Mexico, 2003.Google Scholar
- {91} M. Koppel and J. Schler, "Ad-hoc authorship attribution competition approach outline," in Ad-hoc Authorship Attribution Contest, (P. Juola, ed.), ACH/ALLC 2004, 2004.Google Scholar
- {92} L. Kruh, "A basic probe of the Beale cipher as a bamboozlement: Part I," Cryptologia, vol. 6, no. 4, pp. 378-382, 1982.Google Scholar
Cross Ref
- {93} L. Kruh, "The Beale cipher as a bamboozlement: Part II," Cryptologia, vol. 12, no. 4, pp. 241-246, 1988.Google Scholar
Cross Ref
- {94} H. Kucera and W. N. Francis, Computational Analysis of Present-Day American English. Providence: Brown University Press, 1967.Google Scholar
- {95} T. Kucukyilmaz, B. B. Cambazoglu, C. Aykanat, and F. Can, "Chat mining for gender prediction," Lecture Notes in Computer Science, vol. 4243, p. 274283, 2006. Google Scholar
- {96} O. V. Kukushkina, A. A. Polikarpov, and D. V. Khmelev, "Using literal and grammatical statistics for authorship attribution," Problemy Peredachi Informatii , vol. 37, no. 2, pp. 96-198, Translated in "Problems of Information Transmission," pp. 172-184, 2000. Google Scholar
- {97} M. Li and P. Vitányi, An Introduction to Kolmogorov Complexity and Its Applications. Graduate Texts in Computer Science, New York: Springer, second ed., 1997. Google Scholar
- {98} H. Love, Attributing Authorship: An Introduction. Cambridge: Cambridge University Press, 2002.Google Scholar
- {99} C. Martindale and D. McKenzie, "On the utility of content analysis in authorship attribution: The Federalist Papers," Computers and the Humanities, vol. 29, pp. 259-70, 1995.Google Scholar
Cross Ref
- {100} R. A. J. Matthews and T. V. N. Merriam, "Neural computation in stylometry I: An application to the works of Shakespeare and Marlowe," Literary and Linguistic Computing, vol. 8, no. 4, pp. 203-209, 1993.Google Scholar
Cross Ref
- {101} J. L. McClelland, D. E. Rumelhart, and the PDP Research Group, Parallel Distributed Processing: Explorations in the Microstructure of Cognition. Cambridge, MA: MIT Press, 1987.Google Scholar
- {102} G. McMenamin, "Disputed authorship in US law," International Journal of Speech, Language and the Law, vol. 11, no. 1, pp. 73-82, 2004.Google Scholar
Cross Ref
- {103} G. R. McMenamin, Forensic Stylistics. London: Elsevier, 1993.Google Scholar
- {104} G. R. McMenamin, "Style markers in authorship studies," Forensic Linguistics , vol. 8, no. 2, pp. 93-97, 2001.Google Scholar
- {105} G. R. McMenamin, Forensic Linguistics -- Advances in Forensic Stylistics. Boca Raton, FL: CRC Press, 2002.Google Scholar
- {106} T. C. Mendenhall, "The characteristic curves of composition," Science, vol. IX, pp. 237-249, 1887.Google Scholar
Cross Ref
- {107} T. V. N. Merriam and R. A. J. Matthews, "Neural computation in stylometry II: An application to the works of Shakespeare and Marlowe," Literary and Linguistic Computing, vol. 9, no. 1, pp. 1-6, 1994.Google Scholar
Cross Ref
- {108} G. Monsarrat, "A funeral elegy: Ford, W.S., and Shakespeare," Review of English Studies, vol. 53, p. 186, 2002.Google Scholar
Cross Ref
- {109} A. W. Moore, "Support Vector Machines," Online tutorial: http://jmvidal. cse.sc.edu/csce883/svm14.pdf, accessed May 31, 2007, 2001.Google Scholar
- {110} J. L. Morgan, From Simple Input to Complex Grammar. Cambridge, MA: MIT Press, 1986.Google Scholar
- {111} A. Q. Morton, Literary Detection: How to Prove Authorship and Fraud in Literature and Documents. New York: Scribner's, 1978.Google Scholar
- {112} F. Mosteller and D. L. Wallace, Inference and Disputed Authorship: The Federalist . Reading, MA: Addison-Wesley, 1964.Google Scholar
- {113} M. Newman, J. Pennebaker, D. Berry, and J. Richards, "Lying words: Predicting deception from linguistic style," Personality and Social Psychology Bulletin, vol. 29, pp. 665-675, 2003.Google Scholar
Cross Ref
- {114} S. Nowson and J. Oberlander, "Identifying more bloggers: Towards large scale personality classifiation of personal weblogs," in International Conference on Weblogs and Social Media, Boulder, CO, 2007.Google Scholar
- {115} M. Oakes, "Text categorization: Automatic discrimination between US and UK English using the chi-square text and high ratio pairs," Research in Language , vol. 1, pp. 143-156, 2003.Google Scholar
- {116} J. Oberlander and S. Nowson, "Whose thumb is it anyway? classifying author personality from weblog text," in Proceedings of the 44th Annual Meeting of the Association for Computational Linguistics and 21st International Conference on Computational Linguistics, pp. 627-634, Sydney, Australia, 2006. Google Scholar
- {117} F. Peng, D. Schuurmans, V. Keselj, and S. Wang, "Language independent authorship attribution using character level language models," in Proceedings of the 10th Conference of the European Chapter of the Association for Computational Linguistics, pp. 267-274, Budapest: ACL, 2003. Google Scholar
- {118} J. W. Pennebaker and L. A. King, "Linguistic styles: Language use as an individual difference," Journal of Personality and Social Psychology, vol. 77, pp. 1296-1312, 1999.Google Scholar
Cross Ref
- {119} J. W. Pennebaker and L. D. Stone, "Words of wisdom: Language use over the life span," Journal of Personality and Social Psychology, vol. 85, no. 2, pp. 291-301, 2003.Google Scholar
Cross Ref
- {120} J. Pennebaker, M. Mehl, and K. Niederhoffer, "Psychological aspects of natural language use: Our words, ourselves," Annual Review of Psychology, vol. 54, pp. 547-577, 2003.Google Scholar
Cross Ref
- {121} J. R. Quinlan, C4.5: Programs for Machine Learning. Morgan Kauffman, 1993. Google Scholar
Digital Library
- {122} M. Rockeach, R. Homant, and L. Penner, "A value analysis of the disputed Federalist Papers," Journal of Personality and Social Psychology, vol. 16, pp. 245-250, 1970.Google Scholar
Cross Ref
- {123} S. Rude, E. Gortner, and J. Pennebaker, "Language use of depressed and depression-vulnerable college students," Cognition and Emotion, vol. 18, pp. 1121-1133, 2004.Google Scholar
Cross Ref
- {124} J. Rudman, "The state of authorship attribution studies: Some problems and solutions," Computers and the Humanities, vol. 31, pp. 351-365, 1998.Google Scholar
Cross Ref
- {125} J. Rudman, "Non-traditional authorship attribution studies in eighteenth century literature: Stylistics, statistics and the computer," URL: http:// computerphilologie.uni-muenchen.de/jg02/rudman.html, accessed May 31, 2007.Google Scholar
- {126} J. Rudman, "The State of Authorship Attribution Studies: (1) The History and the Scope; (2) The Problems -- Towards Credibility and Validity," Panel session from ACH/ALLC 1997, 1997.Google Scholar
- {127} J. Rudman, "The non-traditional case for the authorship of the twelve disputed Federalist Papers: A monument built on sand," in Proceedings of ACH/ALLC 2005, Association for Computing and the Humanities, Victoria, BC, 2005.Google Scholar
- {128} D. Rumelhart, G. Hinton, and R. Williams, "Learning internal representations by error propagation," in Parallel Distributed Processing: Explorations in the Microstructure of Cognition, pp. 318-362, The MIT Press, 1986. Google Scholar
- {129} C. E. Shannon, "A mathematical theory of communication," Bell System Technical Journal, vol. 27, no. 4, pp. 379-423, 1948.Google Scholar
Cross Ref
- {130} C. E. Shannon, "Prediction and entropy of printed English," Bell System Technical Journal, vol. 30, no. 1, pp. 50-64, 1951.Google Scholar
- {131} E. H. Simpson, "Measurement of diversity," Nature, vol. 163, p. 688, 1949.Google Scholar
Cross Ref
- {132} S. Singh, The Code Book: The Science of Secrecy from Ancient Egypt to Quantum Cryptography. Anchor, 2000. Google Scholar
- {133} M. Smith, "Recent experiences and new developments of methods for the determination of authorship," Association of Literary and Linguistic Computing Bulletin, vol. 11, pp. 73-82, 1983.Google Scholar
- {134} H. H. Somers, "Statistical methods in literary analysis," in The Computer and Literary Style, (J. Leed, ed.), Kent, OH: Kent State University Press, 1972.Google Scholar
- {135} H. Somers, "An attempt to use weighted cusums to identify sublanguages," in Proceedings of New Methods in Language Processing 3 and Computational Natural Langauge Learning, (D. M. W. Powers, ed.), Sydney, Australia: ACL, 1998. Google Scholar
- {136} H. Somers and F. Tweedie, "Authorship attribution and pastiche," Computers and the Humanities, vol. 37, pp. 407-429, 2003.Google Scholar
Cross Ref
- {137} E. Stamatatos, N. Fakotakis, and G. Kokkinakis, "Computer-based authorship attribution without lexical measures," Computers and the Humanities, vol. 35, no. 2, pp. 193-214, 2001.Google Scholar
Cross Ref
- {138} S. Stein and S. Argamon, "A mathematical explanation of Burrows' Delta," in Proceedings of Digital Humanities 2006, Paris, France, July 2006.Google Scholar
- {139} D. R. Tallentire, "Towards an archive of lexical norms -- a proposal," in The Computer and Literary Studies, Cardiff: Unversity of Wales Press, 1976.Google Scholar
- {140} S. Thomas, "Attributing a funeral elegy," PMLA, vol. 112, no. 3, p. 431, 1997.Google Scholar
- {141} E. Tufte, Envisioning Information. Graphics Press, 1990. Google Scholar
- {142} F. J. Tweedie, S. Singh, and D. I. Holmes, "Neural network applications in stylometry: The Federalist Papers," Computers and the Humanities, vol. 30, no. 1, pp. 1-10, 1996.Google Scholar
Cross Ref
- {143} L. Ule, "Recent progress in computer methods of authorship determination," Association for Literary and Linguistic Computing Bulletin, vol. 10, pp. 73-89, 1982.Google Scholar
- {144} H. van Halteren, "Author verification by linguistic profiling: An exploration of the parameter space," ACM Transactions on Speech and Language Processing, vol. 4, 2007. Google Scholar
- {145} H. van Halteren, R. H. Baayen, F. Tweedie, M. Haverkort, and A. Neijt, "New machine learning methods demonstrate the existence of a human stylome," Journal of Quantitative Linguistics, vol. 12, no. 1, pp. 65-77, 2005.Google Scholar
Cross Ref
- {146} V. N. Vapnik, The Nature of Statistical Learning Theory. Berlin: Springer-Verlag, 1995. Google Scholar
- {147} W. T. Vetterling and B. P. Flannery, Numerical Recipes in C++: The Art of Scientific Computing. Cambridge: Cambridge University Press, 2002. Google Scholar
- {148} B. Vickers, Counterfeiting Shakespeare. Cambridge: Cambridge University Press, 2002.Google Scholar
- {149} F. L. Wellman, The Art of Cross-Examination. New York: MacMillan, fourth ed., 1936.Google Scholar
- {150} C. B. Williams, Style and Vocabulary: Numerical Studies. London: Griffin, 1970.Google Scholar
- {151} A. J. Wyner, "Entropy estimation and patterns," in Proceedings of the 1996 Workshop on Information Theory, 1996.Google Scholar
- {152} B. Yu, Q. Mei, and C. Zhai, "English usage comparison between native and non-native english speakers in academic writing," in Proceedings of ACH/ALLC 2005, Victoria, BC, Canada, 2005.Google Scholar
- {153} G. U. Yule, "On sentence-length as a statistical characteristic of style in prose, with application to two cases of disputed authorship," Biometrika, vol. 30, pp. 363-90, 1938.Google Scholar
- {154} G. U. Yule, The Statistical Study of Literary Vocabulary. Cambridge: Cambridge University Press, 1944.Google Scholar
- {155} P. M. Zatko, "Alternative routes for data acquisition and system compromise," in 3rd Annual IFIP Working Group 11.9 International Conference on Digital Forensics, Orlando, FL, 2007.Google Scholar
- {156} H. Zhang, "The optimality of naive bayes," in Proceedings of the Seventeenth International Florida Artificial Intelligence Research Society Conference , (V. Barr and Z. Markov, eds.), Miami Beach, FL: AAAI Press, 2004.Google Scholar
- {157} R. Zheng, J. Li, H. Chen, and Z. Huang, "A framework for authorship identification of online messages: Writing-style features and classification techniques," Journal of the American Society for Information Science and Technology, vol. 57, no. 3, pp. 378-393, 2006. Google Scholar
Digital Library
- {158} G. K. Zipf, Human Behavior and the Principle of Least Effort. New York: Hafner Publishing Company, 1949. Reprinted 1965.Google Scholar
Index Terms
Authorship attribution




Comments