Artificial Intelligence in Medicine
Volume 46, Issue 2 , Pages 97-109, June 2009

Using WordNet synonym substitution to enhance UMLS source integration

  • Kuo-Chuan Huang

      Affiliations

    • Department of Computer Science, New Jersey Institute of Technology, Newark, NJ 07102-1982, USA
    • Corresponding Author InformationCorresponding author. Tel.: +1 973 596 3392; fax: +1 973 596 5777.
  • ,
  • James Geller

      Affiliations

    • Department of Computer Science, New Jersey Institute of Technology, Newark, NJ 07102-1982, USA
  • ,
  • Michael Halper

      Affiliations

    • Department of Computer Science, Kean University, Union, NJ 07083-0411, USA
  • ,
  • Yehoshua Perl

      Affiliations

    • Department of Computer Science, New Jersey Institute of Technology, Newark, NJ 07102-1982, USA
  • ,
  • Junchuan Xu

      Affiliations

    • Department of Computer Science, New Jersey Institute of Technology, Newark, NJ 07102-1982, USA

Received 27 May 2008; received in revised form 15 August 2008; accepted 9 November 2008.

Summary 

Objective

Synonym-substitution algorithms have been developed for the purpose of matching source vocabulary terms with existing Unified Medical Language System (UMLS) terms during the integration process. A drawback is the possible explosion in the number of newly generated (potential) synonyms, which can tax computational and expert review resources. Experiments are run using a synonym-substitution approach based on WordNet to see how constraining two methodological parameters, namely, “maximum number of substitutions per term” and “maximum term length,” affects performance. Our hypothesis is that these values can be constrained rather tightly—thus greatly speeding up the methodology—without a marked decline in the additional matches produced. Furthermore, we investigate whether a limitation on only the first of the two parameters is sufficient to achieve the same results.

Methods

A four-stage synonym-substitution methodology using WordNet is presented. A group of experiments is carried out in which the two methodological parameters “maximum number of substitutions per term” and “maximum term length” are varied. The purpose is to examine their effect on the growth in the number of potential synonyms generated and the associated loss of results. The experiments are based on the re-integration of the “Minimal Standard Terminology” (MST) into the UMLS. Synonym-substitution matches found to be inconsistent with the current content of the UMLS and thus deemed to be incorrect are further manually scrutinized as an audit of the original integration of the MST.

Results

An increase of 11% in the number of “MST term/UMLS term” matches was achieved using the synonym-substitution methodology. Importantly, this result prevailed when tight threshold values (such as a maximum of two synonym substitutions per term) were imposed on the parameters. Furthermore, it was found that limiting only the “maximum number of substitutions per term” parameter was sufficient to obtain the performance enhancement. During the additional audit phase, a number of the reported mismatches were actually seen to be correct, representing an additional 10% increase in the number of matches obtained.

Conclusion

A synonym-substitution methodology that utilizes WordNet is a useful automated aide in UMLS source integration. Experiments showed that there was a significant speed-up but no degradation in match results when the methodology's “maximum number of substitutions per term” parameter was relatively tightly constrained. The methodology also helped to discover errors in the MST's original integration, and improve the quality of the UMLS's conceptual content.

Keywords: UMLS, Source integration, WordNet, Synonym substitution, Synonym generation, Synonymy, Integration audit

To access this article, please choose from the options below

Login to an existing account or Register a new account.

  • Purchase this article for 31.50 USD (You must login/register to purchase this article)

    Online access for 24 hours. The PDF version can be downloaded as your permanent record.

  • Subscribe to this title

    Get unlimited online access to this article and all other articles in this title 24/7 for one year.

  • Claim access now

    For current subscribers with Society Membership or Account Number.

  • Visit SciVerse ScienceDirect to see if you have access via your institution.
 

PII: S0933-3657(08)00182-6

doi:10.1016/j.artmed.2008.11.008

Artificial Intelligence in Medicine
Volume 46, Issue 2 , Pages 97-109, June 2009