ABSTRACT
Abstract Using a novel evaluation toolkit that simulates a human reviewer in the loop, we compare the effectiveness of three machine-learning protocols for technology-assisted review as used in document review for discovery in legal proceedings. Our comparison addresses a central question in the deployment of technology-assisted review: Should training documents be selected at random, or should they be selected using one or more non-random methods, such as keyword search or active learning? On eight review tasks -- four derived from the TREC 2009 Legal Track and four derived from actual legal matters -- recall was measured as a function of human review effort. The results show that entirely non-random training methods, in which the initial training documents are selected using a simple keyword search, and subsequent training documents are selected by active learning, require substantially and significantly less human review effort (P<0.01) to achieve any given level of recall, than passive learning, in which the machine-learning algorithm plays no role in the selection of training documents. Among passive-learning methods, significantly less human review effort (P<0.01) is required when keywords are used instead of random sampling to select the initial training documents. Among active-learning methods, continuous active learning with relevance feedback yields generally superior results to simple active learning with uncertainty sampling, while avoiding the vexing issue of "stabilization" -- determining when training is adequate, and therefore may stop.
References
- Da Silva Moore v. Publicis Groupe, 287 F.R.D. 182, S.D.N.Y., 2012.Google Scholar
- Case Management Order: Protocol Relating to the Production of Electronically Stored Information ("ESI"), In Re: Actos (Pioglitazone) Products Liability Litigation, MDL No. 6:11-md-2299, W.D. La., July 27, 2012.Google Scholar
- M. Bagdouri, W. Webber, D. D. Lewis, and D. W. Oard. Towards minimizing the annotation cost of certified text classification. In Proceedings of the 22nd ACM International Conference on Information and Knowledge Management, pages 989--998, 2013. Google Scholar
Digital Library
- P. Bailey, N. Craswell, I. Soboroff, P. Thomas, A. de Vries, and E. Yilmaz. Relevance assessment: are judges exchangeable and does it matter? In Proceedings of the 31st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 667--674, 2008. Google Scholar
Digital Library
- S. Büttcher, C. L. A. Clarke, and G. V. Cormack. Information Retrieval: Implementing and Evaluating Search Engines. MIT Press, 2010. Google Scholar
Digital Library
- J. Cheng, A. Jones, C. Privault, and J.-M. Renders. Soft labeling for multi-pass document review. ICAIL 2013 DESI V Workshop, 2013.Google Scholar
- G. V. Cormack and M. Mojdeh. Machine learning for information retrieval: TREC 2009 Web, Relevance Feedback and Legal Tracks. The Eighteenth Text REtrieval Conference (TREC 2009), 2009.Google Scholar
- M. R. Grossman and G. V. Cormack. Technology-assisted review in e-discovery can be more effective and more efficient than exhaustive manual review. Richmond Journal of Law and Technology, 17(3):1--48, 2011.Google Scholar
- M. R. Grossman and G. V. Cormack. Inconsistent responsiveness determination in document review: Difference of opinion or human error? Pace Law Review, 32(2):267--288, 2012.Google Scholar
- M. R. Grossman and G. V. Cormack. The Grossman-Cormack glossary of technology-assisted review with foreword by John M. Facciola, U.S. Magistrate Judge. Federal Courts Law Review, 7(1):1--34, 2013.Google Scholar
- M. R. Grossman and G. V. Cormack. Comments on "The Implications of Rule 26(g) on the Use of Technology-Assisted Review." Federal Courts Law Review, 1, to appear 2014.Google Scholar
- B. Hedin, S. Tomlinson, J. R. Baron, and D. W. Oard. Overview of the TREC 2009 Legal Track. The Eighteenth Text REtrieval Conference (TREC 2009), 2009.Google Scholar
- C. Hogan, J. Reinhart, D. Brassil, M. Gerber, S. Rugani, and T. Jade. H5 at TREC 2008 Legal Interactive: User modeling, assessment & measurement. The Seventeenth Text REtrieval Conference (TREC 2008), 2008.Google Scholar
- D. G. Horvitz and D. J. Thompson. A generalization of sampling without replacement from a finite universe. Journal of the American Statistical Association, 47(260):663--685, 1952.Google Scholar
- D. D. Lewis and W. A. Gale. A sequential algorithm for training text classifiers. In Proceedings of the 17th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 3--12, 1994. Google Scholar
Digital Library
- D. W. Oard and W. Webber. Information retrieval for e-discovery. Information Retrieval, 6(1):1--140, 2012.Google Scholar
Digital Library
- Y. Ravid. System for Enhancing Expert-Based Computerized Analysis of a Set of Digital Documents and Methods Useful in Conjunction Therewith. United States Patent 8527523, 2013.Google Scholar
- H. L. Roitblat, A. Kershaw, and P. Oot. Document categorization in legal electronic discovery: Computer classification vs. manual review. Journal of the American Society for Information Science and Technology, 61(1):70--80, 2010. Google Scholar
Digital Library
- K. Schieneman and T. Gricks. The implications of Rule 26(g) on the use of technology-assisted review. Federal Courts Law Review, 7(1):239--274, 2013.Google Scholar
- J. C. Scholtes, T. van Cann, and M. Mack. The impact of incorrect training sets and rolling collections on technology-assisted review. ICAIL 2013 DESI V Workshop, 2013.Google Scholar
- F. Sebastiani. Machine learning in automated text categorization. ACM Computing Surveys, 34(1):1--47, 2002. Google Scholar
Digital Library
- B. Settles. Active learning literature survey. University of Wisconsin, Madison, 2010.Google Scholar
Digital Library
- M. D. Smucker and C. P. Jethani. Human performance and retrieval precision revisited. In Proceedings of the 33rd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 595--602, 2010. Google Scholar
Digital Library
- S. Tomlinson. Learning Task experiments in the TREC 2010 Legal Track. The Nineteenth Text REtrieval Conference (TREC 2010), 2010.Google Scholar
- E. M. Voorhees. Variations in relevance judgments and the measurement of retrieval effectiveness. Information Processing & Management, 36(5):697--716, 2000. Google Scholar
Digital Library
- W. Webber, D. W. Oard, F. Scholer, and B. Hedin. Assessor error in stratified evaluation. In Proceedings of the 19th ACM International Conference on Information and Knowledge Management, pages 623--632, 2010. Google Scholar
Digital Library
- W. Webber and J. Pickens. Assessor disagreement and text classifier accuracy. In Proceedings of the 36th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 929--932, 2013. Google Scholar
Digital Library
- C. Yablon and N. Landsman-Roos. Predictive coding: Emerging questions and concerns. South Carolina Law Review, 64(3):633--765, 2013.Google Scholar
Index Terms
Evaluation of machine-learning protocols for technology-assisted review in electronic discovery





Comments