10.1145/3292500.3330766acmconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedings
research-article

Seasonal-adjustment Based Feature Selection Method for Predicting Epidemic with Large-scale Search Engine Logs

ABSTRACT

Search engine logs have a great potential in tracking and predicting outbreaks of infectious disease. More precisely, one can use the search volume of some search terms to predict the infection rate of an infectious disease in nearly real-time. However, conducting accurate and stable prediction of outbreaks using search engine logs is a challenging task due to the following two-way instability characteristics of the search logs. First, the search volume of a search term may change irregularly in the short-term, for example, due to environmental factors such as the amount of media or news. Second, the search volume may also change in the long-term due to the demographic change of the search engine. That is to say, if a model is trained with such search logs with ignoring such characteristic, the resulting prediction would contain serious mispredictions when these changes occur. In this work, we proposed a novel feature selection method to overcome this instability problem. In particular, we employ a seasonal-adjustment method that decomposes each time series into three components: seasonal, trend and irregular component and build prediction models for each component individually. We also carefully design a feature selection method to select proper search terms to predict each component. We conducted comprehensive experiments on ten different kinds of infectious diseases. The experimental results show that the proposed method outperforms all comparative methods in prediction accuracy for seven of ten diseases, in both now-casting and forecasting setting. Also, the proposed method is more successful in selecting search terms that are semantically related to target diseases.

References

  1. Monica Anderson and Andrew Perrin. 2017. Tech Adoption Climbs Among Older Adults. (2017).Google ScholarGoogle Scholar
  2. John S Brownstein, Clark C Freifeld, and Lawrence C Madoff. 2009. Digital disease detection-harnessing the Web for public health surveillance. New England Journal of Medicine, Vol. 360, 21 (2009), 2153--2157.Google ScholarGoogle ScholarCross RefCross Ref
  3. Declan Butler. 2013. When Google got flu wrong. Nature, Vol. 494, 7436 (2013), 155.Google ScholarGoogle Scholar
  4. Marc-Allen Cartright, Ryen W White, and Eric Horvitz. 2011. Intentions and attention in exploratory health search. In Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval. ACM, 65--74. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Patrick Copeland, Raquel Romano, Tom Zhang, Greg Hecht, Dan Zigmond, and Christian Stefansen. 2013. Google disease trends: an update. Nature , Vol. 457 (2013), 1012--1014.Google ScholarGoogle Scholar
  6. Niels Dalum Hansen, Kåre Mølbak, Ingemar J Cox, and Christina Lioma. 2017. Seasonal Web Search Query Selection for Influenza-Like Illness (ILI) Estimation. In Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, 1197--1200. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Jeremy Ginsberg, Matthew H Mohebbi, Rajan S Patel, Lynnette Brammer, Mark S Smolinski, and Larry Brilliant. 2009. Detecting influenza epidemics using search engine query data. Nature , Vol. 457, 7232 (2009), 1012.Google ScholarGoogle Scholar
  8. Heidi Godman. 2016. How long does the flu last? (2016).Google ScholarGoogle Scholar
  9. Paul Goodwin and Richard Lawton. 1999. On the asymmetry of the symmetric MAPE. International journal of forecasting , Vol. 15, 4 (1999), 405--408.Google ScholarGoogle Scholar
  10. Isabelle Guyon and André Elisseeff. 2003. An introduction to variable and feature selection. Journal of machine learning research , Vol. 3, Mar (2003), 1157--1182. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. National Institute Of Infectious Diseases (JAPAN). {n.d.}. Infectious Diseases Weekly Report (IDWR). https://www.niid.go.jp/niid/en/idwr-e.html.Google ScholarGoogle Scholar
  12. Ron Kohavi and George H John. 1997. Wrappers for feature subset selection. Artificial intelligence , Vol. 97, 1--2 (1997), 273--324. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Vasileios Lampos, Andrew C Miller, Steve Crossan, and Christian Stefansen. 2015. Advances in nowcasting influenza-like illness rates using search query logs. Scientific reports , Vol. 5 (2015), 12760.Google ScholarGoogle Scholar
  14. Vasileios Lampos, Bin Zou, and Ingemar Johansson Cox. 2017. Enhancing feature selection using word embeddings: The case of flu surveillance. In Proceedings of the 26th International Conference on World Wide Web. International World Wide Web Conferences Steering Committee, 695--704. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Philip M Polgreen, Yiling Chen, David M Pennock, Forrest D Nelson, and Robert A Weinstein. 2008. Using internet searches for influenza surveillance. Clinical infectious diseases , Vol. 47, 11 (2008), 1443--1448.Google ScholarGoogle Scholar
  16. Robert Tibshirani. 1996. Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society. Series B (Methodological) (1996), 267--288.Google ScholarGoogle Scholar
  17. Qingyu Yuan, Elaine O Nsoesie, Benfu Lv, Geng Peng, Rumi Chunara, and John S Brownstein. 2013. Monitoring influenza epidemics in china with search query from baidu. PloS one , Vol. 8, 5 (2013), e64323.Google ScholarGoogle ScholarCross RefCross Ref
  18. Hui Zou and Trevor Hastie. 2005. Regularization and variable selection via the elastic net. Journal of the Royal Statistical Society: Series B (Statistical Methodology) , Vol. 67, 2 (2005), 301--320.Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. Seasonal-adjustment Based Feature Selection Method for Predicting Epidemic with Large-scale Search Engine Logs

          Comments

          Login options

          Check if you have access through your login credentials or your institution to get full access on this article.

          Sign in

          PDF Format

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader
          About Cookies On This Site

          We use cookies to ensure that we give you the best experience on our website.

          Learn more

          Got it!