Artificial Intelligence in Medicine
Volume 37, Issue 1 , Pages 7-18, May 2006

Learning from imbalanced data in surveillance of nosocomial infection

  • Gilles Cohen

      Affiliations

    • Medical Informatics Service, University Hospital of Geneva, Geneva, Switzerland
    • Corresponding Author InformationCorresponding author. Tel.: +41 22 372 7550; fax: +41 22 320 2927.
  • ,
  • Mélanie Hilario

      Affiliations

    • Artificial Intelligence Laboratory, University of Geneva, Geneva, Switzerland
  • ,
  • Hugo Sax

      Affiliations

    • Department of Internal Medicine, University Hospital of Geneva, Geneva, Switzerland
  • ,
  • Stéphane Hugonnet

      Affiliations

    • Department of Internal Medicine, University Hospital of Geneva, Geneva, Switzerland
  • ,
  • Antoine Geissbuhler

      Affiliations

    • Medical Informatics Service, University Hospital of Geneva, Geneva, Switzerland

Received 27 July 2004; received in revised form 8 March 2005; accepted 10 March 2005.

Summary 

Objective

An important problem that arises in hospitals is the monitoring and detection of nosocomial or hospital acquired infections (NIs). This paper describes a retrospective analysis of a prevalence survey of NIs done in the Geneva University Hospital. Our goal is to identify patients with one or more NIs on the basis of clinical and other data collected during the survey.

Methods and material

Standard surveillance strategies are time-consuming and cannot be applied hospital-wide; alternative methods are required. In NI detection viewed as a classification task, the main difficulty resides in the significant imbalance between positive or infected (11%) and negative (89%) cases. To remedy class imbalance, we explore two distinct avenues: (1) a new resampling approach in which both oversampling of rare positives and undersampling of the noninfected majority rely on synthetic cases (prototypes) generated via class-specific subclustering, and (2) a support vector algorithm in which asymmetrical margins are tuned to improve recognition of rare positive cases.

Results and conclusion

Experiments have shown both approaches to be effective for the NI detection problem. Our novel resampling strategies perform remarkably better than classical random resampling. However, they are outperformed by asymmetrical soft margin support vector machines which attained a sensitivity rate of 92%, significantly better than the highest sensitivity (87%) obtained via prototype-based resampling.

Keywords: Nosocomial infection, Machine learning, Support vector machines, Data imbalance

To access this article, please choose from the options below

Login to an existing account or Register a new account.

  • Purchase this article for 31.50 USD (You must login/register to purchase this article)

    Online access for 24 hours. The PDF version can be downloaded as your permanent record.

  • Subscribe to this title

    Get unlimited online access to this article and all other articles in this title 24/7 for one year.

  • Claim access now

    For current subscribers with Society Membership or Account Number.

  • Visit SciVerse ScienceDirect to see if you have access via your institution.
 

PII: S0933-3657(05)00085-0

doi:10.1016/j.artmed.2005.03.002

Artificial Intelligence in Medicine
Volume 37, Issue 1 , Pages 7-18, May 2006