Abstract
We seek to understand the evolving needs of people who are faced with a life-changing medical diagnosis based on analyses of queries extracted from an anonymized search query log. Focusing on breast cancer, we manually tag a set of Web searchers as showing patterns of search behavior consistent with someone grappling with the screening, diagnosis, and treatment of breast cancer. We build and apply probabilistic classifiers to detect these searchers from multiple sessions and to identify the timing of diagnosis using temporal and statistical features. We explore the changes in information seeking over time before and after an inferred diagnosis of breast cancer by aligning multiple searchers by the estimated time of diagnosis. We employ the classifier to automatically identify 1,700 candidate searchers with an estimated 90% precision, and we predict the day of diagnosis within 15 days with an 88% accuracy. We show that the geographic and demographic attributes of searchers identified with high probability are strongly correlated with ground truth of reported incidence rates. We then analyze the content of queries over time for inferred cancer patients, using a detailed ontology of cancer-related search terms. The analysis reveals the rich temporal structure of the evolving queries of people likely diagnosed with breast cancer. Finally, we focus on subtypes of illness based on inferred stages of cancer and show clinically relevant dynamics of information seeking based on the dominant stage expressed by searchers.
- John W. Ayers, Benjamin M. Althouse, Jon-Patrick Allem, Daniel E. Ford, Kurt M. Ribisl, and Joanna E. Cohen. 2012. A novel evaluation of world no tobacco day in latin America. J. Med. Internet Res. 14, 3 (2012).Google Scholar
Cross Ref
- Stephanie L. Ayers and Jennie Jacobs Kronenfeld. 2007. Chronic illness and health-seeking information on the internet. Health 11, 3 (2007).Google Scholar
- Mike Benigeri and Pierre Pluye. 2003. Shortcomings of health information on the internet. Health Promot. Int. 18, 4 (2003).Google Scholar
- Andrei Z. Broder, Marcus Fontoura, Evgeniy Gabrilovich, Amruta Joshi, Vanja Josifovski, and Tong Zhang. 2007. Robust classification of rare queries using web knowledge. In Proceedings of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’07). ACM, New York, NY, 231--238. Google Scholar
Digital Library
- H. J. Burstein, K. Polyak, J. S. Wong, S. C. Lester, and C. M. Kaelin. 2004. Ductal carcinoma in situ of the breast. N. Engl. J. Med. 350, 14 (2004).Google Scholar
Cross Ref
- Marc-Allen Cartright, Ryen W. White, and Eric Horvitz. 2011. Intentions and attention in exploratory health search. In Proceedings of the ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR). ACM, New York, NY. Google Scholar
Digital Library
- K. Castleton, T. Fong, A. Wang-Gillam, M. A. Waqar, D. B. Jeffe, L Kehlenbrink, F. Gao, and R. Govindan. 2011. A survey of internet utilization among patients with cancer. Support Care Cancer 19, 8 (2011).Google Scholar
- Emily H. Chan, Vikram Sahai, Corrie Conrad, and John S. Brownstein. 2011. Using web search query data to monitor dengue epidemics: A new model for neglected tropical disease surveillance. PLoS Negl. Trop. Dis. 5, 5 (2011).Google Scholar
- R. J. W. Cline and K. M. Haynes. 2001. Consumer health information seeking on the internet: The state of the art. Health Educ. Res. 16, 6 (2001).Google Scholar
- L. F. Degner, L. J. Kristjanson, D. Bowman, and et al. 1997. Information needs and decisional preferences in women with breast cancer. J. Am. Med. Assoc. 277, 18 (1997).Google Scholar
Cross Ref
- R. Desai, A. J. Hall, B. A. Lopman, Y. Shimshoni, M. Rennick, N. Efron, Y. Matias, M. M. Patel, and U. D. Parashar. 2012. Norovirus disease surveillance using gGoogle internet query share data. Clin. Infect. Dis. 55, 8 (Oct. 2012), e75--78.Google Scholar
Cross Ref
- Doug Downey, Susan Dumais, and Eric Horvitz. 2007. Models of searching and browsing: Languages, studies, and applications. In Proceedings of the International Joint Conferences on Artificial Intelligence (IJCAI). Google Scholar
Digital Library
- Georges E. Dupret and Benjamin Piwowarski. 2008. A user browsing model to predict search engine click data from past observations. In Proceedings of the ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR). ACM, New York, NY. Google Scholar
Digital Library
- G. Eysenbach. 2006. Infodemiology: Tracking flu-related searches on the web for syndromic surveillance. In Proceedings of the AMIA Annual Symposium.Google Scholar
- G. Eysenbach and C. Kohler. 2002. How do consumers search for and appraise health information on the world wide web? Qualitative studies using focus groups, usability test, and in-depth interviews. Br. Med. J. 324, 7337, 573--577.Google Scholar
Cross Ref
- Lesley Fallowfield. 2001. Participation of patients in decisions about treatment for cancer. Br. Med. J. 323, 7322 (2001).Google Scholar
Cross Ref
- J. L. Fleiss. 1981. Statistical Methods for Rates and Proportions. Second Edition. John Wiley & Sons, New York, NY.Google Scholar
- Adam Fourney, Ryen W. White, and Eric Horvitz. 2015. Exploring time-dependent concerns about pregnancy and childbirth from search logs. In Proceedings of the 33rd Annual ACM Conference on Human Factors in Computing Systems (CHI’15). ACM, New York, NY. Google Scholar
Digital Library
- Susannah Fox and Maeve Duggan. 2013. Health Online 2013. Technical Report. Pew Internet and American Life Project. Retrieved from http://pewinternet.org/Commentary/2011/November/Pew-Internet-Health.aspx.Google Scholar
- Steve Fox, Kuldeep Karnawat, Mark Mydland, Susan Dumais, and Thomas White. 2005. Evaluating implicit measures to improve web search. ACM Trans. Inf. Syst. 23, 2 (2005), 147--168. Google Scholar
Digital Library
- Jerome Friedman, Trevor Hastie, and Robert Tibshirani. 2000. Additive logistic regression: A statistical view of boosting. Ann. Stat. 28, 2 (2000), 337--407.Google Scholar
- G. M. Fulgoni. 2005. The “professional respondent” problem in online survey panels today. In Proceedings of the Market Research Association Annual Conference.Google Scholar
- Christine M. Gaston and Geoffrey Mitchell. 2005. Information giving and decision-making in patients with advanced cancer: A systematic review. Soc. Sci. Med. 61, 10 (2005).Google Scholar
- J. Ginsberg, M. H. Mohebbi, R. S. Patel, L. Brammer, M. S. Smolinski, and L. Brilliant. 2008. Detecting influenza epidemics using search engine query data. Nature 457, 7232 (2008).Google Scholar
- Ronan W. Glynn, John C. Kelly, Norma Coffey, Karl J. Sweeney, and Michael J. Kerin. 2011. The effect of breast cancer awareness month on internet search activity—A comparison with awareness campaigns for lung and prostate cancer. BMC Cancer 11, 442 (2011).Google Scholar
- Qi Guo and Eugene Agichtein. 2010. Ready to buy or just browsing? Detecting web searcher goals from interaction data. In Proceedings of the 33rd International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’10). ACM, New York, NY, 130--137. Google Scholar
Digital Library
- Thomas F. Hack, Lesley F. Degner, Peter Watson, and Luella Sinha. 2006. Do patients benefit from participating in medical decision making? Longitudinal follow-up of women with breast cancer. Psycho-Oncology 15, 1 (2006).Google Scholar
- Ahmed Hassan, Yang Song, and Li-wei He. 2011. A task level metric for measuring web search satisfaction and its application on improving relevance estimation. In Proceedings of the 20th ACM International Conference on Information and Knowledge Management (CIKM’11). ACM, New York, NY. Google Scholar
Digital Library
- Ahmed Hassan, Ryen W. White, Susan T. Dumais, and Yi-Min Wang. 2014. Struggling or exploring? Disambiguating long search sessions. In Proceedings of the 7th ACM International Conference on Web Search and Data Mining (WSDM’14). ACM, New York, NY, 53--62. Google Scholar
Digital Library
- Paul R. Helft. 2012. Patients with cancer, internet information, and the clinical encounter: A taxonomy of patient users. In Proceedings of the 48th Annual Meeting of the American Society of Clinical Ontology.Google Scholar
Cross Ref
- A. Kotov, P. Bennett, R. W. White, S. Dumais, and J. Teevan. 2011. Modeling and analysis of cross-session search tasks. In Proceedings of the ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR). ACM, New York, NY. Google Scholar
Digital Library
- T. Kusmierczyk, C. Trattner, and K. Norvag. 2015. Temporality in online food recipe consumption and production. In Proceedings of the International Conference on World Wide Web (WWW). Google Scholar
Digital Library
- Tessa Lau and Eric Horvitz. 1999. Patterns of search: Analyzing and modeling web query refinement. In Proceedings of the 7th International Conference on User Modeling. Google Scholar
Digital Library
- M. Morrow and J. R. Harris. 2000. Local management of invasive breast cancer. In Diseases of the Breast, J. R. Harris, M. E. Lippman, M. Morrow, and C. K. Osborne (Eds.). Lippincott, Williams & Wilkins.Google Scholar
- National Cancer Institute. 2013. Stages of Breast Cancer. Retrieved from http://www.cancer.gov/cancertopics/pdq/treatment/breast/Patient/page2.Google Scholar
- Yishai Ofran, Ora Paltiel, Dan Pelleg, Jacob M. Rowe, and Elad Yom-Tov. 2012. Patterns of information-seeking for cancer on the internet: An analysis of real world data. PLOS One 7, 9 (2012).Google Scholar
- Michael J. Paul. 2012. Mixed membership Markov models for unsupervised conversation modeling. In Proceedings of the 2012 Conference on Empirical Methods in Natural Language Processing (EMNLP). Google Scholar
Digital Library
- Michael J. Paul, Byron C. Wallace, and Mark Dredze. 2013. What affects patient (dis)satisfaction? Analyzing online doctor ratings with a joint topic-sentiment model. In Proceedings of the AAAI Workshop on Expanding the Boundaries of Health Informatics Using AI.Google Scholar
- Michael J. Paul, Ryen W. White, and Eric Horvitz. 2015. Diagnoses, decisions, and outcomes: Web search as decision support for cancer. In Proceedings of the International Conference on World Wide Web (WWW). Google Scholar
Digital Library
- Eliseo J. Pérez-Stable, Aimee Afable-Munsuz, Celia Patricia Kaplan, Lydia Pace, Cathy Samayoa, and Carol Somkin. 2013. Factors influencing time to diagnosis after abnormal mammography in diverse women. J. Wom. Health 22, 2 (2013).Google Scholar
Cross Ref
- Geraldine Peterson, Parisa Aslani, and Kylie A. Williams. 2003. How do consumers search for and appraise information on medicines on the internet? A qualitative study using focus groups. J. Med. Internet Res. 5, 4 (2003).Google Scholar
- Karthik Raman, Paul N. Bennett, and Kevyn Collins-Thompson. 2014. Understanding intrinsic diversity in web search: Improving whole-session relevance. ACM Trans. Inf. Syst. 32, 4 (2014), 20:1--20:45. Google Scholar
Digital Library
- M. Richardson. 2009. Learning about the world from long-term query logs. ACM Trans. Web 2, 4 (2009). Google Scholar
Digital Library
- Daniel E. Rose and Danny Levinson. 2004. Understanding user goals in web search. In Proceedings of the 13th International Conference on World Wide Web (WWW’04). 13--19. Google Scholar
Digital Library
- Lila J. Finney Rutten, Neeraj K. Arora, Alexis D. Bakos, Noreen Aziz, and Julia Rowland. 2005. Information needs and sources of information among cancer patients: A systematic review of research (1980--2003). Patient Educ. Counsel. 57, 3 (2005).Google Scholar
- M. Santillana, D. W. Zhang, B. M. Althouse, and J. W. Ayers. 2014. What can digital disease detection learn from (an external revision to) google flu trends? Am. J. Prev. Med. 47, 3 (Sept. 2014), 341--347.Google Scholar
Cross Ref
- Melisa J. Satterlund, Kevin D. McCaul, and Ann K. Sandgren. 2003. Information gathering over time by breast cancer patients. J. Med. Internet Res. 5, 3 (2003).Google Scholar
Cross Ref
- Matthew I. Trotter and David W. Morgan. 2008. Patients’ use of the internet for health related matters: A study of internet usage in 2000 and 2006. Health Inform. 14, 3 (2008).Google Scholar
- J. L. Vandergrift, J. C. Niland, R. L. Theriault, S. B. Edge, Y. Wong, and et al. 2013. Time to adjuvant chemotherapy for breast cancer in national comprehensive cancer network institutions. J. Natl. Cancer Inst. 105, 2 (2013).Google Scholar
Cross Ref
- Robert West, Ryen W. White, and Eric Horvitz. 2013. From cookies to cooks: Insights on dietary patterns via analysis of web usage logs. In Proceedings of the International Conference on World Wide Web (WWW). Google Scholar
Digital Library
- Ryen W. White, Paul N. Bennett, and Susan T. Dumais. 2010. Predicting short-term interests using activity-based search context. In Proceedings of the 19th ACM International Conference on Information and Knowledge Management (CIKM’10). ACM, New York, NY, 1009--1018. Google Scholar
Digital Library
- Ryen W. White and Steven M. Drucker. 2007. Investigating behavioral variability in web search. In International Conference on World Wide Web (WWW). Google Scholar
Digital Library
- Ryen W. White and Eric Horvitz. 2009. Cyberchondria: Studies of the escalation of medical concerns in web search. ACM Trans. Inf. Syst. 27, 4 (2009). Google Scholar
Digital Library
- R. W. White and E. Horvitz. 2010. Web to world: Predicting transitions from self-diagnosis to the pursuit of local medical assistance in web search. AMIA Annu. Symp. Proc. 2010 (2010), 882--886.Google Scholar
- Ryen W. White and Eric Horvitz. 2012. Studies of the onset and persistence of medical concerns in search logs. In ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR). ACM, New York, NY. Google Scholar
Digital Library
- Ryen W. White and Eric Horvitz. 2013a. From health search to healthcare: Explorations of intention and utilization via query logs and user surveys. J. Am. Med. Inform. Assoc. epub (2013).Google Scholar
- Ryen W. White and Eric Horvitz. 2013b. From web search to healthcare utilization: Privacy-sensitive studies from mobile data. J. Am. Med. Inform. Assoc. 20 (2013).Google Scholar
- Ryen W. White, Nicholas P. Tatonetti, Nigam H. Shah, Russ B. Altman, and Eric Horvitz. 2013. Web-scale pharmacovigilance: Listening to signals from the crowd. J. Am. Med. Inform. Assoc. 20, 3 (2013).Google Scholar
Cross Ref
- Elad Yom-Tov and Evgeniy Gabrilovich. 2013. Postmarket drug surveillance without trial costs: Discovery of adverse drug reactions through large-scale analysis of web search queries. J. Med. Internet Res. 15, 6 (2013), e124.Google Scholar
Cross Ref
- Sue Ziebland, Alison Chapple, Carol Dumelow, Julie Evans, Suman Prinjha, and Linda Rozmovits. 2004. How the internet affects patients’ experience of cancer: A qualitative study. Br. Med. J. 328, 7439 (2004).Google Scholar
Cross Ref
Index Terms
Search and Breast Cancer: On Episodic Shifts of Attention over Life Histories of an Illness
Recommendations
Breast cancer diagnosis using back-propagation algorithm
ICWET '11: Proceedings of the International Conference & Workshop on Emerging Trends in TechnologyThe automatic diagnosis of breast cancer is an important, real-world medical problem. The purpose of this study is to develop a method of classifying cancer tumor to specific diagnostic categories based on their gene expression signatures. The proposed ...
Support system for breast cancer treatment
BEBI'09: Proceedings of the 2nd WSEAS international conference on Biomedical electronics and biomedical informaticsThe aim of this paper is to seek out optimal relation between diagnostic and therapeutic methods for breast cancer treatment. Bone metastases cause significant morbidity due to pain, pathological fracture, hypercalcaemia and spinal cord compression, as ...
Computer-Based Identification of Breast Cancer Using Digitized Mammograms
High-quality mammography is the most effective technology presently available for breast cancer screening. Efforts to improve mammography focus on refining the technology and improving how it is administered and X-ray films are interpreted. Computer-...






Comments