Abstract
Social media analytics can considerably contribute to understanding health conditions beyond clinical practice, by capturing patients’ discussions and feelings about their quality of life in relation to disease treatments. In this article, we propose a methodology to support a detailed analysis of the therapeutic experience in patients affected by a specific disease, as it emerges from health forums. As a use case to test the proposed methodology, we analyze the experience of patients affected by hypothyroidism and their reactions to standard therapies. Our approach is based on a data extraction and filtering pipeline, a novel topic detection model named Generative Text Compression with Agglomerative Clustering Summarization (GTCACS), and an in-depth data analytic process. We advance the state of the art on automated detection of adverse drug reactions (ADRs) since, rather than simply detecting and classifying positive or negative reactions to a therapy, we are capable of providing a fine characterization of patients along different dimensions, such as co-morbidities, symptoms, and emotional states.
- [1] . 2019. Improving classification of adverse drug reactions through using sentiment analysis and transfer learning. In Proceedings of the 18th BioNLP Workshop and Shared Task. Association for Computational Linguistics, 339–347. https://doi.org/10.18653/v1/W19-5036Google Scholar
- [2] . 2016. Identifying patient experience from online resources via sentiment analysis and topic modelling. In Proceedings of the 3rd IEEE/ACM International Conference on Big Data Computing, Applications and Technologies
(BDCAT’16) . Association for Computing Machinery, New York, NY, 9499. https://doi.org/10.1145/3006299.3006335 Google ScholarDigital Library
- [3] . 2017. Extracting adverse drug reactions and their context using sequence labelling ensembles in TAC2017. In Proceedings of the 2017 Text Analysis Conference
(TAC’17) . NIST, Gaithersburg, Maryland.Google Scholar - [4] . 2011. Algorithms for hyper-parameter optimization. In Proceedings of the 24th International Conference on Neural Information Processing Systems
(NIPS ’11) . Curran Associates Inc., Red Hook, NY, 25462554. Google ScholarDigital Library
- [5] . 2003. Latent Dirichlet allocation. Journal of Machine Learning Research 3, null (
March 2003), 9931022. Google ScholarDigital Library
- [6] . 2017. Enriching word vectors with subword information. Transactions of the Association for Computational Linguistics 5 (2017), 135–146. https://doi.org/10.1162/tacl_a_00051Google Scholar
Cross Ref
- [7] . 2009. Normalized (pointwise) mutual information in collocation extraction, From Form to Meaning: Processing Texts Automatically. In Proceedings of the Biennial GSCL Conference 2009, 31–40.Google Scholar
- [8] . 1974. A dendrite method for cluster analysis. Communications in Statistics—Theory and Methods 3, 1 (1974), 1–27. https://doi.org/10.1080/03610927408827101Google Scholar
Cross Ref
- [9] . 2017. Hypothyroidism. The Lancet 390 (March 2017). https://doi.org/10.1016/S0140-6736(17)30703-1Google Scholar
- [10] . 2011. Automatic recognition of emotion based on a cognitively motivated emotion annotation system. Journal of Cognitive Science 12 (Dec. 2011), 279–296. https://doi.org/10.17791/jcs.2011.12.3.279Google Scholar
- [11] . 2020. Mining social media data for biomedical signals and health-related behavior. Annual Review of Biomedical Data Science 3 (July 2020), 433–458. https://doi.org/10.1146/annurev-biodatasci-030320-040844Google Scholar
Cross Ref
- [12] . 2010. Discriminative capacity of the EQ-5D, SF-6D, and SF-12 as measures of health status in population health survey. Quality of Life Research: An International Journal of Quality of Life Aspects of Treatment, Care and Rehabilitation 19 (March 2010), 853–864. https://doi.org/10.1007/s11136-010-9639-zGoogle Scholar
- [13] . 1979. A cluster separation measure. IEEE Transactions on Pattern Analysis and Machine Intelligence 1, 2 (
Feb. 1979), 224227. https://doi.org/10.1109/TPAMI.1979.4766909 Google ScholarDigital Library
- [14] . 1982. The Extended Phenotype: The Gene as the Unit of Selection. Freeman, Oxford.
81009889 https://books.google.it/books?id=uJCUAQAACAAJ.Google Scholar - [15] . 2019. Health topics on facebook groups: Content analysis of posts in multiple sclerosis communities. Interactive Journal of Medical Research 8, 1 (
Feb. 2019), e10146. https://doi.org/10.2196/10146Google Scholar - [16] . 2018. Analysis of content shared in online cancer communities: Systematic review. JMIR Cancer 4, 1 (April 2018), e6. https://doi.org/10.2196/cancer.7926Google Scholar
- [17] . 1992. An argument for basic emotions. Cognition and Emotion 6, 3-4 (1992), 169–200. https://doi.org/10.1080/02699939208411068Google Scholar
Cross Ref
- [18] . 2014. The incidence and prevalence of thyroid dysfunction in europe: A meta-analysis. The Journal of Clinical Endocrinology & Metabolism 99, 3 (March 2014), 923–931. https://doi.org/10.1210/jc.2013-2409
arXiv:https://academic.oup.com/jcem/article-pdf/99/3/923/11157991/jcem09 23.pdf .Google Scholar - [19] . 2016. Modeling documents with generative adversarial networks. CoRR abs/1612.09122.
arxiv:1612.09122 http://arxiv.org/abs/1612.09122.Google Scholar - [20] . 2014. Generative adversarial networks. Advances in Neural Information Processing Systems 3 (June 2014). https://doi.org/10.1145/3422622Google Scholar
- [21] . 2019. Changes in TSH levels in athyreotic patients with differentiated thyroid cancer during levothyroxine therapy: Influence on dose adjustments. Journal of Endocrinological Investigation 42 (June 2019). https://doi.org/10.1007/s40618-019-01074-xGoogle Scholar
- [22] . 2013. Harnessing the cloud of patient experience: Using social media to detect poor quality healthcare. BMJ Quality & Safety 22 (Jan. 2013). https://doi.org/10.1136/bmjqs-2012-001527Google Scholar
- [23] . 2002. Cluster validity methods: Part I. SIGMOD Rec. 31, 2 (
June 2002), 4045. https://doi.org/10.1145/565117.565124 Google ScholarDigital Library
- [24] . 2015. Twitter K-H networks in action: Advancing biomedical literature for drug search. Journal of Biomedical Informatics 56 (2015), 157–168. https://doi.org/10.1016/j.jbi.2015.05.015 Google Scholar
Digital Library
- [25] . 2014. Computational Network Science: An Algorithmic Approach (1st ed.). Morgan Kaufmann Publishers, Inc., San Francisco, CA. Google Scholar
Digital Library
- [26] . 2011. Sequential model-based optimization for general algorithm configuration. In Proceedings of the 5th International Conference on Learning and Intelligent Optimization
(LION’05) . Springer-Verlag, Berlin, 507523. https://doi.org/10.1007/978-3-642-25566-3_40 Google ScholarDigital Library
- [27] . 2015. The digital phenotype. Nature Biotechnology 33 (May 2015), 462–463. https://doi.org/10.1038/nbt.3223Google Scholar
Cross Ref
- [28] . 2018. “Miscommunication in Doctor-Patient Communication.”Topics in Cognitive Science 10 (Feb. 2018), 409–424.Google Scholar
- [29] . 2019. Business decision support system based on sentiment analysis. International Journal of Information Engineering and Electronic Business 11 (
Jan. 2019), 36–49. https://doi.org/10.5815/ijieeb.2019.01.05Google Scholar - [30] . 1994. Positive matrix factorization: A non-negative factor model with optimal utilization of error estimates of data values. Environmetrics 5, 2 (1994), 111–126.Google Scholar
Cross Ref
- [31] . 2014. Big data analytics in healthcare: Promise and potential. Health Information Science and Systems 2 (
Feb. 2014), 3. https://doi.org/10.1186/2047-2501-2-3Google ScholarCross Ref
- [32] . 1999. An evaluation of criteria for measuring the quality of clusters. In Proceedings of the 16th International Joint Conference on Artificial Intelligence
(IJCAI’99) . Morgan Kaufmann Publishers, Inc., San Francisco, CA, 905910. Google ScholarDigital Library
- [33] . 1987. Rousseeuw, P.J.: Silhouettes: A Graphical aid to the interpretation and validation of cluster analysis. Journal of Computational and Applied Mathematics 20 (
Nov. 1987), 53–65. https://doi.org/10.1016/0377-0427(87)90125-7 Google ScholarDigital Library
- [34] . 2014. Evaluation and measurement of patient experience. Patient Experience Journal 1 (
April 2014), 5.Google Scholar - [35] . 2014. Mining adverse drug reactions from online healthcare forums using hidden Markov model.BMC Medical Informatics and Decision Making 14 (Oct. 2014), 91. https://doi.org/10.1186/1472-6947-14-91Google Scholar
Cross Ref
- [36] . 2000. When ‘others’ initiate repair. Applied Linguistics 21 (June 2000). https://doi.org/10.1093/applin/21.2.205Google Scholar
- [37] . 2016. Social media use in healthcare: A systematic review of effects on patients and on their relationship with healthcare professionals. BMC Health Services Research 16, 1 (
Aug. 2016), 442.Google ScholarCross Ref
- [38] . 2016. Defining patient centric pharmaceutical drug product design. The AAPS Journal 18 (
June 2016). https://doi.org/10.1208/s12248-016-9938-6Google Scholar - [39] . 2013. Automated learning of everyday patients’ language for medical blogs analytics. In Recent Advances in Natural Language Processing (RANLP’13). INCOMA Ltd. Shoumen, BULGARIA, Hissar, Bulgaria, 640–648. https://www.aclweb.org/anthology/R13-1084.Google Scholar
- [40] . 1963. Hierarchical grouping to optimize an objective function. Journal of the American Statistical Assocication 58, 301 (1963), 236–244. https://doi.org/10.1080/01621459.1963.10500845Google Scholar
Cross Ref
- [41] . 2019. From conventional machine learning to AutoML. Journal of Physics: Conference Series 1207 (April 2019), 012015.Google Scholar
Cross Ref
- [42] . 2018. Detecting neurodegenerative disorders from web search signals. npj Digital Medicine 1 (Dec. 2018). https://doi.org/10.1038/s41746-018-0016-6Google Scholar
- [43] . 1992. Stacked generalization. Neural Networks 5 (
Dec. 1992), 241–259. https://doi.org/10.1016/S0893-6080(05)80023-1 Google ScholarDigital Library
- [44] . 1997. No free lunch theorems for optimization. IEEE Transactions on Evolutionary Computation 1, 1 (
April 1997), 6782. https://doi.org/10.1109/4235.585893 Google ScholarDigital Library
- [45] . 2012. Social media mining for drug safety signal detection. In Proceedings of the 2012 International Workshop on Smart Health and Wellbeing
(SHB’12) . Association for Computing Machinery, New York, NY, 3340. https://doi.org/10.1145/2389707.2389714. Google ScholarDigital Library
- [46] . 2017. Unsupervised representation learning with deep convolutional neural network for remote sensing images. In Image and Graphics, , , and (Eds.). Springer International Publishing, Cham, 97–108. Google Scholar
- [47] . 2018. Evaluation and refinement of clustered search results with the crowd. ACM Transactions on Interactive Intelligent Systems 8, 2 (
June 2018), Article 14, 28 pages. https://doi.org/10.1145/3158226 Google ScholarDigital Library
- [48] . 2016. Energy-based generative adversarial network. CoRR abs/1609.03126 (Sept. 2016).
arxiv:1609.03126 . http://arxiv.org/abs/1609.03126.Google Scholar
Index Terms
Supporting Personalized Health Care With Social Media Analytics: An Application to Hypothyroidism
Recommendations
Postmarketing Drug Safety Surveillance Using Publicly Available Health-Consumer-Contributed Content in Social Media
Postmarketing drug safety surveillance is important because many potential adverse drug reactions cannot be identified in the premarketing review process. It is reported that about 5% of hospital admissions are attributed to adverse drug reactions and ...
Using Health-Consumer-Contributed Data to Detect Adverse Drug Reactions by Association Mining with Temporal Analysis
Regular Papers and Special Section on Intelligent Healthcare InformaticsSince adverse drug reactions (ADRs) represent a significant health problem all over the world, ADR detection has become an important research topic in drug safety surveillance. As many potential ADRs cannot be detected though premarketing review, drug ...
Common data model for decision support system of adverse drug reaction to extract knowledge from multi-center database
An adverse drug reaction (ADR) surveillance system integrated with various electronic medical record (EMR) systems has been suggested as an effective way to collect more data and analyze ADRs earlier than the spontaneous reporting of ADRs. Because ...






Comments