Automatic structuring of breast cancer radiology reports for quality assurance

Shreyasi Pathak, Jorit van Rossen, Onno Vijlbrief, Jeroen Geerdink, Christin Seifert, Maurice van Keulen

Research output: Chapter in Book/Report/Conference proceedingConference contributionAcademicpeer-review

36 Downloads (Pure)

Abstract

Hospitals often set protocols based on well defined standards to maintain quality of patient reports. To ensure that the clinicians conform to the protocols, quality assurance of these reports is needed. Patient reports are currently written in free-text format, which complicates the task of quality assurance. In this paper, we present a machine learning based natural language processing system for automatic quality assurance of radiology reports on breast cancer. This is achieved in three steps: We i) identify the top level structure of the report, ii) check whether the information under each section corresponds to the section heading, iii) convert the free-text detailed findings in the report to a semi-structured format. Top level structure and content of report were predicted with an F1 score of 0.97 and 0.94 respectively using Support Vector Machine (SVM). For automatic structuring, our proposed hierarchical Conditional Random Field (CRF) outperformed the baseline CRF with an F1 score of 0.78 vs 0.71. The third step generates a semi-structured XML format of the free-text report, which helps to easily visualize the conformance of the findings to the protocols. This format also allows easy extraction of specific information for other purposes such as search, evaluation and research.

Original languageEnglish
Title of host publicationProceedings of the Workshop on Data Mining in Biomedical Informatics and Healthcare (DMBIH 2018)
EditorsJeffrey Yu, Zhenhui Li, Hanghang Tong, Feida Zhu
PublisherIEEE Computer Society
Pages732-739
Number of pages8
Volume2018-November
ISBN (Electronic)9781538692882
DOIs
Publication statusPublished - 17 Nov 2018
Event6th Workshop on Data Mining in Biomedical Informatics and Healthcare 2018 - Singapore, Singapore
Duration: 17 Nov 201817 Nov 2018
Conference number: 6
http://facweb.cs.depaul.edu/research/vc/ICDM18/index.html

Workshop

Workshop6th Workshop on Data Mining in Biomedical Informatics and Healthcare 2018
Abbreviated titleDMBIH 2018
CountrySingapore
CitySingapore
Period17/11/1817/11/18
Internet address

Fingerprint

Radiology
Quality assurance
Network protocols
Natural language processing systems
XML
Support vector machines
Learning systems

Keywords

  • Quality Assurance
  • Automatic Structuring
  • Radiology
  • Natural Language Processing
  • Conditional Random Fields

Cite this

Pathak, S., van Rossen, J., Vijlbrief, O., Geerdink, J., Seifert, C., & van Keulen, M. (2018). Automatic structuring of breast cancer radiology reports for quality assurance. In J. Yu, Z. Li, H. Tong, & F. Zhu (Eds.), Proceedings of the Workshop on Data Mining in Biomedical Informatics and Healthcare (DMBIH 2018) (Vol. 2018-November, pp. 732-739). [8637387] IEEE Computer Society. https://doi.org/10.1109/ICDMW.2018.00111
Pathak, Shreyasi ; van Rossen, Jorit ; Vijlbrief, Onno ; Geerdink, Jeroen ; Seifert, Christin ; van Keulen, Maurice . / Automatic structuring of breast cancer radiology reports for quality assurance. Proceedings of the Workshop on Data Mining in Biomedical Informatics and Healthcare (DMBIH 2018). editor / Jeffrey Yu ; Zhenhui Li ; Hanghang Tong ; Feida Zhu. Vol. 2018-November IEEE Computer Society, 2018. pp. 732-739
@inproceedings{404eb865ca9848018ef94b5b846f7281,
title = "Automatic structuring of breast cancer radiology reports for quality assurance",
abstract = "Hospitals often set protocols based on well defined standards to maintain quality of patient reports. To ensure that the clinicians conform to the protocols, quality assurance of these reports is needed. Patient reports are currently written in free-text format, which complicates the task of quality assurance. In this paper, we present a machine learning based natural language processing system for automatic quality assurance of radiology reports on breast cancer. This is achieved in three steps: We i) identify the top level structure of the report, ii) check whether the information under each section corresponds to the section heading, iii) convert the free-text detailed findings in the report to a semi-structured format. Top level structure and content of report were predicted with an F1 score of 0.97 and 0.94 respectively using Support Vector Machine (SVM). For automatic structuring, our proposed hierarchical Conditional Random Field (CRF) outperformed the baseline CRF with an F1 score of 0.78 vs 0.71. The third step generates a semi-structured XML format of the free-text report, which helps to easily visualize the conformance of the findings to the protocols. This format also allows easy extraction of specific information for other purposes such as search, evaluation and research.",
keywords = "Quality Assurance, Automatic Structuring, Radiology, Natural Language Processing, Conditional Random Fields",
author = "Shreyasi Pathak and {van Rossen}, Jorit and Onno Vijlbrief and Jeroen Geerdink and Christin Seifert and {van Keulen}, Maurice",
year = "2018",
month = "11",
day = "17",
doi = "10.1109/ICDMW.2018.00111",
language = "English",
volume = "2018-November",
pages = "732--739",
editor = "Jeffrey Yu and Zhenhui Li and Hanghang Tong and Feida Zhu",
booktitle = "Proceedings of the Workshop on Data Mining in Biomedical Informatics and Healthcare (DMBIH 2018)",
publisher = "IEEE Computer Society",
address = "United States",

}

Pathak, S, van Rossen, J, Vijlbrief, O, Geerdink, J, Seifert, C & van Keulen, M 2018, Automatic structuring of breast cancer radiology reports for quality assurance. in J Yu, Z Li, H Tong & F Zhu (eds), Proceedings of the Workshop on Data Mining in Biomedical Informatics and Healthcare (DMBIH 2018). vol. 2018-November, 8637387, IEEE Computer Society, pp. 732-739, 6th Workshop on Data Mining in Biomedical Informatics and Healthcare 2018, Singapore, Singapore, 17/11/18. https://doi.org/10.1109/ICDMW.2018.00111

Automatic structuring of breast cancer radiology reports for quality assurance. / Pathak, Shreyasi ; van Rossen, Jorit; Vijlbrief, Onno; Geerdink, Jeroen; Seifert, Christin ; van Keulen, Maurice .

Proceedings of the Workshop on Data Mining in Biomedical Informatics and Healthcare (DMBIH 2018). ed. / Jeffrey Yu; Zhenhui Li; Hanghang Tong; Feida Zhu. Vol. 2018-November IEEE Computer Society, 2018. p. 732-739 8637387.

Research output: Chapter in Book/Report/Conference proceedingConference contributionAcademicpeer-review

TY - GEN

T1 - Automatic structuring of breast cancer radiology reports for quality assurance

AU - Pathak, Shreyasi

AU - van Rossen, Jorit

AU - Vijlbrief, Onno

AU - Geerdink, Jeroen

AU - Seifert, Christin

AU - van Keulen, Maurice

PY - 2018/11/17

Y1 - 2018/11/17

N2 - Hospitals often set protocols based on well defined standards to maintain quality of patient reports. To ensure that the clinicians conform to the protocols, quality assurance of these reports is needed. Patient reports are currently written in free-text format, which complicates the task of quality assurance. In this paper, we present a machine learning based natural language processing system for automatic quality assurance of radiology reports on breast cancer. This is achieved in three steps: We i) identify the top level structure of the report, ii) check whether the information under each section corresponds to the section heading, iii) convert the free-text detailed findings in the report to a semi-structured format. Top level structure and content of report were predicted with an F1 score of 0.97 and 0.94 respectively using Support Vector Machine (SVM). For automatic structuring, our proposed hierarchical Conditional Random Field (CRF) outperformed the baseline CRF with an F1 score of 0.78 vs 0.71. The third step generates a semi-structured XML format of the free-text report, which helps to easily visualize the conformance of the findings to the protocols. This format also allows easy extraction of specific information for other purposes such as search, evaluation and research.

AB - Hospitals often set protocols based on well defined standards to maintain quality of patient reports. To ensure that the clinicians conform to the protocols, quality assurance of these reports is needed. Patient reports are currently written in free-text format, which complicates the task of quality assurance. In this paper, we present a machine learning based natural language processing system for automatic quality assurance of radiology reports on breast cancer. This is achieved in three steps: We i) identify the top level structure of the report, ii) check whether the information under each section corresponds to the section heading, iii) convert the free-text detailed findings in the report to a semi-structured format. Top level structure and content of report were predicted with an F1 score of 0.97 and 0.94 respectively using Support Vector Machine (SVM). For automatic structuring, our proposed hierarchical Conditional Random Field (CRF) outperformed the baseline CRF with an F1 score of 0.78 vs 0.71. The third step generates a semi-structured XML format of the free-text report, which helps to easily visualize the conformance of the findings to the protocols. This format also allows easy extraction of specific information for other purposes such as search, evaluation and research.

KW - Quality Assurance

KW - Automatic Structuring

KW - Radiology

KW - Natural Language Processing

KW - Conditional Random Fields

UR - http://www.scopus.com/inward/record.url?scp=85062876065&partnerID=8YFLogxK

U2 - 10.1109/ICDMW.2018.00111

DO - 10.1109/ICDMW.2018.00111

M3 - Conference contribution

VL - 2018-November

SP - 732

EP - 739

BT - Proceedings of the Workshop on Data Mining in Biomedical Informatics and Healthcare (DMBIH 2018)

A2 - Yu, Jeffrey

A2 - Li, Zhenhui

A2 - Tong, Hanghang

A2 - Zhu, Feida

PB - IEEE Computer Society

ER -

Pathak S, van Rossen J, Vijlbrief O, Geerdink J, Seifert C, van Keulen M. Automatic structuring of breast cancer radiology reports for quality assurance. In Yu J, Li Z, Tong H, Zhu F, editors, Proceedings of the Workshop on Data Mining in Biomedical Informatics and Healthcare (DMBIH 2018). Vol. 2018-November. IEEE Computer Society. 2018. p. 732-739. 8637387 https://doi.org/10.1109/ICDMW.2018.00111