Abstract
Healthcare big data remains under-utilized due to various incompatibility issues between the domains of data analytics and healthcare. The lack of generalizable iterative feature acquisition methods under budget and machine learning models that allow reasoning with a model’s uncertainty are two examples. Meanwhile, a boost to the available data is currently under way with the rapid growth in the Internet of Things applications and personalized healthcare. For the healthcare domain to be able to adopt models that take advantage of this big data, machine learning models should be coupled with more informative, germane feature acquisition methods, consequently adding robustness to the model’s results. We introduce an approach to feature selection that is based on Bayesian learning, allowing us to report the level of uncertainty in the model, combined with false-positive and false-negative rates. In addition, measuring target-specific uncertainty lifts the restriction on feature selection being target agnostic, allowing for feature acquisition based on a target of focus. We show that acquiring features for a specific target is at least as good as deep learning feature selection methods and common linear feature selection approaches for small non-sparse datasets, and surpasses these when faced with real-world data that is larger in scale and sparseness.
- Centers for Disease Control and Prevention. 2018. National Health and Nutrition Examination Survey. Retrieved March 30, 2020 from https://www.cdc.gov/nchs/nhanes.Google Scholar
- H. Bentz, M. Hagstroem, and G. Palm. 1997. Selection of relevant features and examples in machine learning. Neural Networks 2, 4 (1997), 289--293.Google Scholar
Digital Library
- Christopher M. Bishop. 2006. Pattern Recognition and Machine Learning. Information Science and Statistics. Springer-Verlag, Berlin, Germany.Google Scholar
- Arthur Choi, Yexiang Xue, and Adnan Darwiche. 2012. Same-decision probability: A confidence measure for threshold-based decisions. International Journal of Approximate Reasoning 53 (2012), 1415--1428.Google Scholar
Digital Library
- YooJung Choi, Adnan Darwiche, and Guy Van den Broeck. 2017. Optimal feature selection for decision robustness in Bayesian networks. In Proceedings of the 26th International Joint Conference on Artificial Intelligence (IJCAI’17).Google Scholar
- J. S. Denker and Y. LeCun. 1991. Transforming neural-net output levels to probability distributions. In Advances in Neural Information Processing Systems (NIPS’90), R. Lippmann, J. Moody, and D. Touretzky (Eds.), Vol. 3. Morgan Kaufman, Denver, CO.Google Scholar
- Dua Dheeru and Efi Karra Taniskidou. 2017. UCI Machine Learning Repository. Retrieved March 30, 2020 from http://archive.ics.uci.edu/ml.Google Scholar
- Alberto Freitas, Altamiro Costa-Pereira, and Pavel Brazdil. 2007. Cost-sensitive decision trees applied to medical data. In Proceedings of the International Conference on Data Warehousing and Knowledge Discovery. 303--312.Google Scholar
- Tianshi Gao and Daphne Koller. 2011. Active classification based on value of classifier. In Advances in Neural Information Processing Systems 24, J. Shawe-Taylor, R. S. Zemel, P. L. Bartlett, F. Pereira, and K. Q. Weinberger (Eds.). Curran Associates, Red Hook, NY, 1062--1070. http://papers.nips.cc/paper/4340-active-classification-based-on-value-of-classifier.pdf.Google Scholar
- Pierre Geurts, Damien Ernst, and Louis Wehenkel. 2006. Extremely randomized trees. Machine Learning 63, 1 (2006), 3--42.Google Scholar
Digital Library
- Zoubin Ghahramani. 2015. Probabilistic machine learning and artificial intelligence. Nature 521 (May 2015), 452--459. DOI:http://dx.doi.org/10.1038/nature14541Google Scholar
- Russell Greiner, Adam J. Grove, and Dan Roth. 2002. Learning cost-sensitive active classifiers. Artificial Intelligence 139, 2 (Aug. 2002), 137--174. DOI:https://doi.org/10.1016/S0004-3702(02)00209-6Google Scholar
- Peter Groves, Basel Kayyali, David Knott, and Steve Van Kuiken. 2013. The ‘big data’ revolution in healthcare. McKinsey Quarterly 2, 3 (2013), 1--17.Google Scholar
- Isabelle Guyon and André Elisseeff. 2003. An introduction to variable and feature selection. Journal of Machine Learning Research 3 (March 2003), 1157--1182.Google Scholar
- Stefan Herzog and Dirk Ostwald. 2013. Sometimes Bayesian statistics are better. Nature 494, 7435 (2013), 35.Google Scholar
- Shamsul Huda, John Yearwood, Herbert F. Jelinek, Mohammad Mehedi Hassan, Giancarlo Fortino, and Michael Buckland. 2016. A hybrid feature selection with ensemble classification for imbalanced healthcare data: A case study for brain tumor diagnosis. IEEE Access 4 (2016), 9145--9154.Google Scholar
Cross Ref
- Jaromír Janisch, Tomáš Pevnỳ, and Viliam Lisỳ. 2019. Classification with costly features using deep reinforcement learning. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 33. 3959--3966.Google Scholar
Digital Library
- Insik Jo, Sangbum Lee, and Sejong Oh. 2019. Improved measures of redundancy and relevance for mRMR feature selection. Computers 8, 2 (2019), 42.Google Scholar
- Michael I. Jordan, Zoubin Ghahramani, Tommi S. Jaakkola, and Lawrence K. Saul. 1999. An introduction to variational methods for graphical models. Machine Learning 37, 2 (Nov. 1999), 183--233. DOI:https://doi.org/10.1023/A:1007665907178Google Scholar
- Mohammad Kachuee, Orpaz Goldstein, Kimmo Karkkainen, and Majid Sarrafzadeh. 2019. Opportunistic learning: Budgeted cost-sensitive learning from data streams. In Proceedings of the International Conference on Learning Representations. https://openreview.net/forum?id=S1eOHo09KX.Google Scholar
- Mohammad Kachuee, Kimmo Karkkainen, Orpaz Goldstein, Davina Zamanzadeh, and Majid Sarrafzadeh. 2019. Nutrition and health data for cost-sensitive learning. arXiv:1902.07102.Google Scholar
- Diederik P. Kingma and Max Welling. 2013. Auto-encoding variational Bayes. arxiv:1312.6114.Google Scholar
- Alexander Kraskov, Harald Stögbauer, and Peter Grassberger. 2004. Estimating mutual information. Physical Review E 69, 6 (2004), 066138.Google Scholar
Cross Ref
- Martin Krzywinski and Naomi Altman. 2013. Points of significance: Importance of being uncertain. Nature Methods 10, 9 (Sept. 2013), 809--810.Google Scholar
- B. Lakshminarayanan, A. Pritzel, and C. Blundell. 2016. Simple and scalable predictive uncertainty estimation using deep ensembles. arxiv:stat.ML/1612.01474.Google Scholar
- Jundong Li, Kewei Cheng, Suhang Wang, Fred Morstatter, Robert P. Trevino, Jiliang Tang, and Huan Liu. 2018. Feature selection: A data perspective. ACM Computing Surveys 50, 6 (2018), 94.Google Scholar
Digital Library
- Peng Liu, Lei Lei, Junjie Yin, Wei Zhang, Wu Naijun, and Elia El-Darzi. 2006. Healthcare data mining: Prediction inpatient length of stay. In Proceedings of the 2006 3rd International IEEE Conference on Intelligent Systems. IEEE, Los Alamitos, CA, 832--837.Google Scholar
- Dijun Luo, Fei Wang, Jimeng Sun, Marianthi Markatou, Jianying Hu, and Shahram Ebadollahi. 2012. SOR: Scalable orthogonal regression for non-redundant feature selection and its healthcare applications. In Proceedings of the 2012 SIAM International Conference on Data Mining. 576--587.Google Scholar
- Dunja Mladenic and Marko Grobelnik. 1999. Feature selection for unbalanced class distribution and naive bayes. In Proceedings of the 16th International Conference on Machine Learning (ICML’99), Vol. 99. 258--267.Google Scholar
- Kevin P. Murphy. 2012. Machine Learning: A Probabilistic Perspective. MIT Press, Cambridge, MA.Google Scholar
Digital Library
- Sriraam Natarajan, Srijita Das, Nandini Ramanan, Gautam Kunapuli, and Predrag Radivojac. 2018. On whom should I perform this lab test next? An active feature elicitation approach. In Proceedings of the 27th International Joint Conference on Artificial Intelligence (IJCAI’18). 3498--3505. DOI:https://doi.org/10.24963/ijcai.2018/486Google Scholar
- Radford M. Neal. 1996. Bayesian Learning for Neural Networks. Springer-Verlag, Berlin, Germany.Google Scholar
- Hanchuan Peng, Fuhui Long, and Chris Ding. 2005. Feature selection based on mutual information: Criteria of max-dependency, max-relevance, and min-redundancy. IEEE Transactions on Pattern Analysis 8 Machine Intelligence 8 (2005), 1226--1238.Google Scholar
Digital Library
- Brian C. Ross. 2014. Mutual information between discrete and continuous data sets. PLoS One 9, 2 (2014), e87357.Google Scholar
Cross Ref
- K. Srinivas, B. Kavihta Rani, and A. Govrdhan. 2010. Applications of data mining techniques in healthcare and prediction of heart attacks. International Journal on Computer Science and Engineering 2, 2 (2010), 250--255.Google Scholar
- Robert Tibshirani. 1996. Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society. Series B (Methodological) 58, 1 (1996), 267--288.Google Scholar
Cross Ref
- Naftali Tishby, Esther Levin, and Sara A. Solla. 1989. Consistent inference of probabilities in layered networks: Predictions and generalization. In Proceedings of the International 1989 Joint Conference on Neural Networks. IEEE, Los Alamitos, CA, 403--409.Google Scholar
- Dustin Tran, Alp Kucukelbir, Adji B. Dieng, Maja Rudolph, Dawen Liang, and David M. Blei. 2016. Edward: A library for probabilistic modeling, inference, and criticism. arXiv:1610.09787.Google Scholar
- Jialei Wang, Peilin Zhao, Steven C. H. Hoi, and Rong Jin. 2014. Online feature selection and its applications. IEEE Transactions on Knowledge and Data Engineering 26, 3 (2014), 698--710.Google Scholar
Digital Library
- Yichuan Wang, LeeAnn Kung, and Terry Anthony Byrd. 2018. Big data analytics: Understanding its capabilities and potential benefits for healthcare organizations. Technological Forecasting and Social Change 126 (2018), 3--13.Google Scholar
Cross Ref
- Stephen F. Weng, Jenna Reps, Joe Kai, Jonathan M. Garibaldi, and Nadeem Qureshi. 2017. Can machine-learning improve cardiovascular risk prediction using routine clinical data? PLoS One 12, 4 (2017), e0174944.Google Scholar
Cross Ref
- Shipeng Yu, Balaji Krishnapuram, Romer Rosales, and R. Bharat Rao. 2009. Active sensing. In Proceedings of the 12th International Conference on Artificial Intelligence and Statistics. 639--646.Google Scholar
- Wei Zheng, Xiaofeng Zhu, Yonghua Zhu, and Shichao Zhang. 2018. Robust feature selection on incomplete data. In Proceedings of the 27th International Joint Conference on Artificial Intelligence (IJCAI’18). 3191--3197. DOI:https://doi.org/10.24963/ijcai.2018/443Google Scholar
Digital Library
- Choong Ho Lee and Hyung-Jin Yoon. 2017. Medical big data: promise and challenges. Kidney Research and Clinical Practice 36, 1 (2017), 3.Google Scholar
Cross Ref
Index Terms
Target-Focused Feature Selection Using Uncertainty Measurements in Healthcare Data
Recommendations
Health diagnosis robot based on healthcare big data and fuzzy matching
Applied Mathematics Related to Nonlinear ProblemsThis paper discussed a healthcare big data analysis system and device for health diagnosis robot. The system is composed of the health examination data acquisition equipment and a healthcare big data server. The health examination data acquisition ...
Towards computational solutions for precision medicine based big data healthcare system using deep learning models: A review
AbstractThe emergence of large-scale human genome projects, advances in DNA sequencing technologies, and the massive volume of electronic medical records [EMR] shift the transformation of healthcare research into the next paradigm, namely ‘...
Highlights- Focuses on the transformation of healthcare research into the precision medicine based clinical research.
Big data and predictive modeling topics in healthcare
BCB '15: Proceedings of the 6th ACM Conference on Bioinformatics, Computational Biology and Health InformaticsThis panel discussion will first review a few current big data and predictive modeling topics in healthcare in both providers and payers marketspace, including the problems, current status and challenges, and opportunities for big data and future ...






Comments