skip to main content
editorial
Free Access

Introduction to the Special Issue on Computational Methods for Biomedical NLP

Published:12 January 2022Publication History

It is now well established that biomedical text requires methods targeted for the domain. Developments in deep learning and a series of successful shared challenges have contributed to a steady progress in techniques for natural language processing of biomedical text. Contributing to this on-going progress and particularly focusing on computational methods, this special issue was created to encourage research in novel approaches for analyzing biomedical text. The six papers selected for the issue offer a diversity of novel methods that leverage biomedical text for research and clinical uses.

A well-established practice in pretraining deep learning models for biomedical applications has been to adopt a most promising model that was already pretrained on general domain natural language corpus and then “add” additional pre-training with biomedical corpora. In “ DOI: Domain-specific language model pretraining for biomedical natural language processing”, Gu et al. successfully challenge this approach. The authors conducted an experiment where multiple standard benchmarks were used to compare a model that was pre-trained entirely and only on biomedical corpus with models that were pretrained using the “add” on approach. Results showed an impressive improvement in favor of pretraining only with biomedical corpus. The study provides an excellent data-point in support of clarity in model training rather than accumulation.

Tariq et al. also find using domain-aware tokenization and embeddings to be more effective in their paper “ DOI: Bridging the Gap Between Structured and Free-form Radiology Reporting: A Case-study on Coronary CT Angiography”. They compare a variety of models constructed to predict the severity of cardiovascular disease from the language used within free-text radiology reports. Models that used medical-domain-aware tokenization and word embeddings of the reports were consistently more effective than raw word-based. The better models are able to accurately predict disease severity under real-world conditions of diverse terminology from different radiologists and unbalanced class size.

Two papers address the problem of maintaining the privacy of clinical documents, though from widely different perspectives. De-identification is the most used approach to eliminate PHI (Protected Health Information) in clinical documents before making the data available to NLP researchers. In “ DOI: A Context-enhanced De-identification System”, Kahyun et al. describe an improved de-identification technique for clinical records. Their context-enhanced de-identification system called CEDI uses attention mechanisms in a long short-term memory (LSTM) network to capture the appropriate context. This context allows the system to detect dependencies that cross sentence boundaries, an important feature since clinical reports often contain such dependencies. Nonetheless, accurate and broad-coverage de-identification of unstructured data remains challenging, and lack of trust in the process (of de-identification) can be a serious limiting factor for data release.

In “ DOI: Differentially Private Medical Texts Generation using Generative Neural Networks”, Aziz et al. take a different approach to dealing with privacy of clinical documents. They propose synthetic generation of clinical documents with high accuracy as a practical alternative. Using self-attention based neural networks and differential privacy (i.e., the ability to control the level of privacy relative to the original document) in their method, they demonstrate modern generative approaches can be effectively used here. Novel metrics based on token level distribution, document level similarity for an outcome, and adversarial classification at corpus level were used to measure the goodness of their approach. The results suggest a viable alternative to de-identification.

Increasingly, social media is complementing Electronic Health Records as a valuable source of patients’ disease status and responses to treatments. Exponential growth of social media and relaxed concerns about patient privacy in this channel means availability of larger and less constrained data for analysis. In “ DOI: Supporting Personalized Health Care with Social Media Analytics”, Grani et al. have developed novel methods for characterizing adverse drug events reported by patients in web forums. The study included social media posts of patients who were receiving treatments for hypothyroidism. A particularly novel aspect of their work was using two adversarial networks (as in a GAN) to generate compressed latent vectors for social media posts, which were subsequently clustered to identify important discussion topics related to treatment responses and ADRs. One of the networks (the classic “discriminator”) is an auto-encoder which is regularized by the adversarial network (the “generator”) which learns to produce realistic synthetic posts. Through detailed analysis of patient response clusters, using the results from topic modeling, this paper establishes a novel methodology for analyzing exponentially increasing posts from web forums.

The final paper, “ DOI: GeCoAgent: A Conversational Agent for Empowering Genomic Data Extraction and Analysis” by Crovari et al. uses natural language processing not to analyze texts but to support a conversational interface between a genomics researcher and a system for managing genomics experiments. The goal of the overall GeCoAgent system is to enable biologists with limited computer skills to independently exploit the computational tools available to manage and interpret data arising from genomics experiments. The language processing within the system allows the biologist to interact with it through dialogue, enhancing the biologist user's experience and capabilities.

We want to thank the diligent work, often under time pressure, by the reviewers of the papers submitted to this special issue. Without their volunteer efforts this special issue would not have been possible. We leaned on several of them multiple times through personal requests, and they always came through. All of us at ACM and its readers are truly indebted for their contributions. We also acknowledge the scientific and professional contributions of the authors of all submitted papers, and their immense patience while we conducted the review process during the once in a century world-wide pandemic. Lastly, we are grateful to the Editors-in-Chief of ACM Health, John A. Stankovic and Insup Lee, for trusting us with this special issue and for the support of the editorial staff, especially Victoria White.

Murthy V. DevarakondaEllen M. VoorheesGuest Editors

Recommendations

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Sign in

Full Access

  • Published in

    cover image ACM Transactions on Computing for Healthcare
    ACM Transactions on Computing for Healthcare  Volume 3, Issue 1
    January 2022
    255 pages
    ISSN:2691-1957
    EISSN:2637-8051
    DOI:10.1145/3485154
    Issue’s Table of Contents

    Copyright © 2022 Copyright held by the owner/author(s).

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    • Published: 12 January 2022
    Published in health Volume 3, Issue 1

    Permissions

    Request permissions about this article.

    Request Permissions

    Check for updates

    Qualifiers

    • editorial
    • Refereed
  • Article Metrics

    • Downloads (Last 12 months)128
    • Downloads (Last 6 weeks)12

    Other Metrics

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format .

View HTML Format
About Cookies On This Site

We use cookies to ensure that we give you the best experience on our website.

Learn more

Got it!