Abstract
Studies have shown that search engine queries are indicative of future diagnosis of several types of cancer. These studies were based on self-identification of illness and were limited in that diagnostic information could not be shared with screened individuals. Here I report on two studies that overcome these limitations.
Advertisements were displayed on the Bing and Google ads systems to people who sought to self-diagnose one of three types of cancer. People who clicked on these ads were provided with clinically verified questionnaires and the outcomes of these questionnaires.
A classifier trained to predict suspected cancer, inferred from questionnaire responses, from past Bing queries reached an area under the curve of 0.64. People who received information that their symptoms were consistent with suspected cancer increased searches for healthcare utilization.
In a second study, questionnaire responses provided to the conversion optimization mechanism of the Google advertisement system enabled it to learn to identify people who were likely to have suspected cancer. Following a training period of approximately 10 days, 11% of people selected for showing of targeted campaign ads were found to have suspected cancer.
These results demonstrate the utility of using modern advertising systems to identify people who are likely suffering from serious medical conditions.
- Leo Breiman. 2001. Random forests. Machine Learning 45, 1 (2001), 5--32.Google Scholar
Digital Library
- Munmun De Choudhury, Michael Gamon, Scott Counts, and Eric Horvitz. 2013. Predicting depression via social media. In Proceedings of the 7th International AAAI Conference on Weblogs and Social Media (ICWSM’13).1--10.Google Scholar
- Richard O. Duda, Peter E. Hart, and David G. Stork. 2012. Pattern Classification. John Wiley 8 Sons.Google Scholar
- Gunther Eysenbach. 2006. Infodemiology: Tracking flu-related searches on the web for syndromic surveillance. In AMIA Annual Symposium Proceedings, Vol. 2006. American Medical Informatics Association, Bethesda, MD, 244.Google Scholar
- National Institute for Health and Clinical Excellence. 2017. Suspected Cancer: Recognition and Referral. Retrieved February 15, 2020 from https://www.nice.org.uk/guidance/ng12.Google Scholar
- Susannah Fox. 2013. Health Online 2013. Pew Research Center. Retrieved February 15, 2020 from http://www.pewinternet.org/2013/01/15/health-online-2013/.Google Scholar
- Eitan Giat and Elad Yom-Tov. 2018. Evidence from web-based dietary search patterns to the role of B12 deficiency in non-specific chronic pain: A large-scale observational study. Journal of Medical Internet Research 20, 1 (2018), e4.Google Scholar
Cross Ref
- Irit Hochberg, Deeb Daoud, Naim Shehadeh, and Elad Yom-Tov. 2019. Can Internet search engine queries be used to diagnose diabetes? Analysis of archival search data. Acta Diabetologica 56, 10 (2019), 1149--1154.Google Scholar
Cross Ref
- Panos Ipeirotis and Evgeniy Gabrilovich. 2014. Quizz: Targeted crowdsourcing with a billion (potential) users. In Proceedings of the 23rd International World Wide Web Conference.Google Scholar
- Zura Kakushadze, Rakesh Raghubanshi, and Willie Yu. 2017. Estimating cost savings from early cancer diagnosis. Data 2, 3 (2017), 30.Google Scholar
- Karla Kerlikowske and John Barclay. 1997. Outcomes of modern screening mammography. JNCI Monographs 1997, 22 (1997), 105--111.Google Scholar
- Vasileios Lampos, Elad Yom-Tov, Richard Pebody, and Ingemar J. Cox. 2015. Assessing the impact of a health intervention via user-generated Internet content. Data Mining and Knowledge Discovery 29, 5 (2015), 1434--1457.Google Scholar
Digital Library
- Sean McPhail, Sam Johnson, David Greenberg, Mick Peake, and Brian Rous. 2015. Stage at diagnosis and early mortality from cancer in England. British Journal of Cancer 112, s1 (2015), S108.Google Scholar
- Kevin A. Padrez, Lyle Ungar, Hansen Andrew Schwartz, Robert J. Smith, Shawndra Hill, Tadas Antanavicius, Dana M. Brown, Patrick Crutchley, David A. Asch, and Raina M. Merchant. 2015. Linking social media and medical record data: A study of adults presenting to an academic, urban emergency department. BMJ Quality and Safety 25, 6 (2015), 414--423.Google Scholar
- John Paparrizos, Ryen W. White, and Eric Horvitz. 2016. Screening for pancreatic adenocarcinoma using signals from web search logs: Feasibility study and results. Journal of Oncology Practice 12, 8 (2016), 737--744.Google Scholar
Cross Ref
- Dan Pelleg, Elad Yom-Tov, and Yoelle Maarek. 2012. Can you believe an anonymous contributor? On truthfulness in Yahoo! Answers. In Proceedings of the 2012 International Conference on Privacy, Security, Risk, and Trust (PASSAT’17) and the 2012 International Conference on Social Computing (SocialCom’17). IEEE, Los Alamitos, CA, 411--420.Google Scholar
- Philip M. Polgreen, Yiling Chen, David M. Pennock, Forrest D. Nelson, and Robert A. Weinstein. 2008. Using Internet searches for influenza surveillance. Clinical Infectious Diseases 47, 11 (2008), 1443--1448.Google Scholar
Cross Ref
- Gustav Preller and Sabine Salloch. 2018. Melanoma in the shopping mall: A utilitarian argument for offering unsolicited medical opinions in informal settings. Bioethics 32, 3 (2018), 193--198.Google Scholar
- Richard M. Ratzan. 1985. Unsolicited medical opinion. Journal of Medicine and Philosophy 10, 2 (1985), 147--162.Google Scholar
- Matthew Richardson. 2008. Learning about the world through long-term query logs. ACM Transactions on the Web 2, 4 (2008), 21.Google Scholar
Digital Library
- T. Rubeca, S. Rapi, M. Confortini, M. Brogioni, G. Grazzini, M. Zappa, D. Puliti, G. Castiglione, and S. Ciatto. 2006. Evaluation of diagnostic accuracy of screening by fecal occult blood testing (FOBT). Comparison of FOB Gold and OC sensor assays in a consecutive prospective screening series. International Journal of Biological Markers 21, 3 (2006), 157--161.Google Scholar
- Adam Sadilek, Stephanie Caty, Lauren DiPrete, Raed Mansour, Tom Schenk Jr., Mark Bergtholdt, Ashish Jh, Prem Ramaswami, and Evgeniy Gabrilovich. 2019. Machine-learned epidemiology: Real-time detection of foodborne illness at scale. NPJ Digital Medicine 1 (2019), Article 36.Google Scholar
- Rebecca L. Siegel, Kimberly D. Miller, and Ahmedin Jemal. 2017. Cancer statistics, 2017. CA: A Cancer Journal for Clinicians 67, 1 (2017), 7--30.Google Scholar
Cross Ref
- Luca Soldaini and Elad Yom-Tov. 2017. Inferring individual attributes from search engine queries and auxiliary information. In Proceedings of the 26th International Conference on World Wide Web. 293--301.Google Scholar
Digital Library
- Matthew C. Stiefel, Rocco J. Perla, and Bonnie L. Zell. 2010. A healthy bottom line: Healthy life expectancy as an outcome measure for health improvement efforts. Milbank Quarterly 88, 1 (2010), 30--53.Google Scholar
- C. S. Thomson and David Forman. 2009. Cancer survival in England and the influence of early diagnosis: What can we learn from recent EUROCARE results? British Journal of Cancer 101, S2 (2009), S102.Google Scholar
- Susan P. Weinstein, Emily F. Conant, Susan G. Orel, Julia A. Zuckerman, and Richard Bellah. 2000. Spectrum of US findings in pediatric and adolescent patients with palpable breast masses. Radiographics 20, 6 (2000), 1613--1621.Google Scholar
- Ryen W. White and Eric Horvitz. 2010. Web to world: Predicting transitions from self-diagnosis to the pursuit of local medical assistance in web search. In AMIA Annual Symposium Proceedings, Vol. 2010. American Medical Informatics Association, Bethesda, MD, 882.Google Scholar
- Ryen W. White and Eric Horvitz. 2017. Evaluation of the feasibility of screening patients for early signs of lung carcinoma in web search logs. JAMA Oncology 3, 3 (2017), 398--401.Google Scholar
Cross Ref
- Elad Yom-Tov. 2016. Crowdsourced Health: How What You Do on the Internet Will Improve Medicine. MIT Press, Cambridge, MA.Google Scholar
Digital Library
- Elad Yom-Tov. 2019. Demographic differences in search engine use with implications for cohort selection. Information Retrieval Journal 22, 6 (2019), 570--580.Google Scholar
Cross Ref
- Elad Yom-Tov, Diana Borsa, Andrew C. Hayward, Rachel A. McKendry, and Ingemar J. Cox. 2015. Automatic identification of web-based risk markers for health events. Journal of Medical Internet Research 17, 1 (2015), e29.Google Scholar
Cross Ref
- Elad Yom-Tov, Anat Brunstein-Klomek, Or Mandel, Arie Hadas, and Silvana Fennig. 2018. Inducing behavioral change in seekers of pro-anorexia content using Internet advertisements: Randomized controlled trial. JMIR Mental Health 5, 1 (2018). e6.Google Scholar
- Elad Yom-Tov and Evgeniy Gabrilovich. 2013. Postmarket drug surveillance without trial costs: Discovery of adverse drug reactions through large-scale analysis of web search queries. Journal of Medical Internet Research 15, 6 (2013), e124.Google Scholar
Cross Ref
- Elad Yom-Tov, Peter Muennig, and Abdulrahman M. El-Sayed. 2016. Web-based antismoking advertising to promote smoking cessation: A randomized controlled trial. Journal of Medical Internet Research 18, 11 (2016), e306.Google Scholar
- Elad Yom-Tov, Jinia Shembekar, Sarah Barclay, and Peter Muennig. 2018. The effectiveness of public health advertisements to promote health: A randomized-controlled trial on 794,000 participants. NPJ Digital Medicine 1, 1 (2018), 24.Google Scholar
Index Terms
Screening for Cancer Using a Learning Internet Advertising System
Recommendations
Consumer Search Activities and the Value of Ad Positions in Sponsored Search Advertising
Consumer search activities can be endogenously determined by the ad positions in sponsored search advertising. We model how advertisers compete for ad positions in sponsored listings and, conditional on the list of sponsored ads, how online consumers ...
Evaluation of machine learning techniques for prostate cancer diagnosis and Gleason grading
Although the gold standard for prostate cancer tissue grading has been the Gleason grading scheme, it is strongly affected by 'inter- and intra observer variations'. Therefore, the development of objective and reproducible computer-aided classification ...






Comments