Artificial Intelligence in Medicine
Volume 51, Issue 1 , Pages 17-25, January 2011

Exploiting the systematic review protocol for classification of medical abstracts

  • Oana Frunza

      Affiliations

    • School of Information Technology and Engineering, University of Ottawa, 800 King Edward, Ottawa, Ontario, Canada K1N 6N5
    • Corresponding Author InformationCorresponding author. Tel.: +1 613 562 5800x2140; fax: +1 613 562 5175.
  • ,
  • Diana Inkpen

      Affiliations

    • School of Information Technology and Engineering, University of Ottawa, 800 King Edward, Ottawa, Ontario, Canada K1N 6N5
  • ,
  • Stan Matwin

      Affiliations

    • School of Information Technology and Engineering, University of Ottawa, 800 King Edward, Ottawa, Ontario, Canada K1N 6N5
  • ,
  • William Klement

      Affiliations

    • School of Information Technology and Engineering, University of Ottawa, 800 King Edward, Ottawa, Ontario, Canada K1N 6N5
  • ,
  • Peter O’Blenis

      Affiliations

    • Evidence Partners Corporation, 9 Wick Crescent, Ottawa, Ontario, Canada K1J 7H1

Received 18 January 2008; received in revised form 22 September 2010; accepted 14 October 2010.

Abstract 

Objective

To determine whether the automatic classification of documents can be useful in systematic reviews on medical topics, and specifically if the performance of the automatic classification can be enhanced by using the particular protocol of questions employed by the human reviewers to create multiple classifiers.

Methods and materials

The test collection is the data used in large-scale systematic review on the topic of the dissemination strategy of health care services for elderly people. From a group of 47,274 abstracts marked by human reviewers to be included in or excluded from further screening, we randomly selected 20,000 as a training set, with the remaining 27,274 becoming a separate test set. As a machine learning algorithm we used complement naïve Bayes. We tested both a global classification method, where a single classifier is trained on instances of abstracts and their classification (i.e., included or excluded), and a novel per-question classification method that trains multiple classifiers for each abstract, exploiting the specific protocol (questions) of the systematic review. For the per-question method we tested four ways of combining the results of the classifiers trained for the individual questions. As evaluation measures, we calculated precision and recall for several settings of the two methods. It is most important not to exclude any relevant documents (i.e., to attain high recall for the class of interest) but also desirable to exclude most of the non-relevant documents (i.e., to attain high precision on the class of interest) in order to reduce human workload.

Results

For the global method, the highest recall was 67.8% and the highest precision was 37.9%. For the per-question method, the highest recall was 99.2%, and the highest precision was 63%. The human–machine workflow proposed in this paper achieved a recall value of 99.6%, and a precision value of 17.8%.

Conclusion

The per-question method that combines classifiers following the specific protocol of the review leads to better results than the global method in terms of recall. Because neither method is efficient enough to classify abstracts reliably by itself, the technology should be applied in a semi-automatic way, with a human expert still involved. When the workflow includes one human expert and the trained automatic classifier, recall improves to an acceptable level, showing that automatic classification techniques can reduce the human workload in the process of building a systematic review.

Keywords: Automatic text classification, Text representation, Medical concepts, Ensemble of classifiers, Systematic reviews for the medical domain

 

PII: S0933-3657(10)00124-7

doi:10.1016/j.artmed.2010.10.005

Artificial Intelligence in Medicine
Volume 51, Issue 1 , Pages 17-25, January 2011