Abstract
We present techniques for gathering data that expose errors of automatic predictive models. In certain common settings, traditional methods for evaluating predictive models tend to miss rare but important errors—most importantly, cases for which the model is confident of its prediction (but wrong). In this article, we present a system that, in a game-like setting, asks humans to identify cases that will cause the predictive model-based system to fail. Such techniques are valuable in discovering problematic cases that may not reveal themselves during the normal operation of the system and may include cases that are rare but catastrophic. We describe the design of the system, including design iterations that did not quite work. In particular, the system incentivizes humans to provide examples that are difficult for the model to handle by providing a reward proportional to the magnitude of the predictive model's error. The humans are asked to “Beat the Machine” and find cases where the automatic model (“the Machine”) is wrong. Experiments show that the humans using Beat the Machine identify more errors than do traditional techniques for discovering errors in predictive models, and, indeed, they identify many more errors where the machine is (wrongly) confident it is correct. Furthermore, those cases the humans identify seem to be not simply outliers, but coherent areas missed completely by the model. Beat the Machine identifies the “unknown unknowns.” Beat the Machine has been deployed at an industrial scale by several companies. The main impact has been that firms are changing their perspective on and practice of evaluating predictive models.
“There are known knowns. These are things we know that we know. There are known unknowns. That is to say, there are things that we know we don't know. But there are also unknown unknowns. There are things we don't know we don't know.”
--Donald Rumsfeld
- Josh Attenberg, Panagiotis G. Ipeirotis, and Foster J. Provost. 2011. Beat the machine: Challenging workers to find the unknown unknowns. In Proceedings of the 3rd Human Computation Workshop (HCOMP'11).Google Scholar
- Josh Attenberg and Foster Provost. 2010. Inactive learning? Difficulties employing active learning in practice. SIGKDD Explorations 12, 2 (2010), 36--41. Google Scholar
Digital Library
- C. Chow. 1970. On optimum recognition error and reject tradeoff. IEEE Transactions on Information Theory 16, 1 (Jan. 1970), 41--46. Google Scholar
Digital Library
- C. K. Chow. 1957. An optimum character recognition system using decision functions. IRE Transactions on Electronic Computers EC-6, 4 (Dec. 1957), 247--254.Google Scholar
Cross Ref
- Pedro Domingos. 1999. Metacost: A general method for making classifiers cost-sensitive. In Proceedings of the 5th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 155--164. Google Scholar
Digital Library
- Charles Elkan. 2001. The foundations of cost-sensitive learning. In Proceedings of the 17th International Joint Conference on Artificial Intelligence- Volume 2. 973--978. Google Scholar
Digital Library
- David D. Lewis and William A. Gale. 1994. A sequential algorithm for training text classifiers. In Proceedings of the 17th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. 3--12. Google Scholar
Digital Library
- C. Perlich, B. Dalessandro, T. Raeder, O. Stitelman, and F. Provost. 2014. Machine learning for targeted display advertising: Transfer learning in action. Machine Learning 95, 1 (2014), 103--127. Google Scholar
Digital Library
- Foster Provost and Tom Fawcett. 2013. Data Science for Business: What You Need to Know about Data Mining and Data-Analytic Thinking. O'Reilly Media. Google Scholar
Digital Library
- Vikas C. Raykar, Shipeng Yu, Linda H. Zhao, Anna Jerebko, Charles Florin, Gerardo Hermosillo Valadez, Luca Bogoni, and Linda Moy. 2009. Supervised learning from multiple experts: Whom to trust when everyone lies a bit. In Proceedings of the 26th Annual International Conference on Machine Learning. 889--896. Google Scholar
Digital Library
- R. Reiter. 1977. On Closed World Data Bases. Technical Report. University of British Columbia, Vancouver, BC, Canada. Google Scholar
Digital Library
- Maytal Saar-Tsechansky and Foster Provost. 2004. Active sampling for class probability estimation and ranking. Machine Learning 54, 2 (2004), 153--178. Google Scholar
Digital Library
- Robert E. Schapire. 1999. A brief introduction to boosting. In Proceedings of the 16th International Joint Conference on Artificial Intelligence - Volume 2. Google Scholar
Digital Library
- Burr Settles. 2012. Active Learning. Vol. 6. Morgan & Claypool Publishers.Google Scholar
- Victor S. Sheng, Foster Provost, and Panagiotis G. Ipeirotis. 2008. Get another label? Improving data quality and data mining using multiple, noisy labelers. In Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 614--622. Google Scholar
Digital Library
- Kurt VanLehn. 1998. Analogy events: How examples are used during problem solving. Cognitive Science 22, 3 (1998), 347--388.Google Scholar
Cross Ref
- Gary M. Weiss. 2010. The impact of small disjuncts on classifier learning. In Data Mining, Robert Stahlbock, Sven F. Crone, and Stefan Lessmann (Eds.). Annals of Information Systems, Vol. 8. Springer, 193--226.Google Scholar
Cross Ref
- Patrick H. Winston. 1970. Learning Structural Descriptions From Examples. Technical Report. Massachusetts Institute of Technology. Google Scholar
Digital Library
Index Terms
Beat the Machine: Challenging Humans to Find a Predictive Model's “Unknown Unknowns”
Recommendations
Beat the machine: challenging workers to find the unknown unknowns
AAAIWS'11-11: Proceedings of the 11th AAAI Conference on Human ComputationWe present techniques for gathering data that expose errors of automatic predictive models. In certain common settings, traditional methods for evaluating predictive models tend to miss rare-but-important errors--most importantly, rare cases for which ...
Contradict the Machine: A Hybrid Approach to Identifying Unknown Unknowns
AAMAS '19: Proceedings of the 18th International Conference on Autonomous Agents and MultiAgent SystemsMachine predictions that are highly confident yet incorrect, i.e. unknown unknowns, are crucial errors to identify, especially in high-stakes settings like medicine or law. We describe a hybrid approach to identifying unknown unknowns that combines the ...
Assessing spatial predictive models in the environmental sciences
A comprehensive assessment of the performance of predictive models is necessary as they have been increasingly employed to generate spatial predictions for environmental management and conservation and their accuracy is crucial to evidence-informed ...





Comments