Post-Structuring Radiology Reports of Breast Cancer Patients for Clinical Quality Assurance

Shreyasi Pathak, Jorit van Rossen, Onno Vijlbrief, Jeroen Geerdink, Christin Seifert, Maurice van Keulen

Research output: Contribution to journalArticleAcademicpeer-review

8 Downloads (Pure)

Abstract

Hospitals often set protocols based on well defined standards to maintain the quality of patient reports. To ensure that the clinicians conform to the protocols, quality assurance of these reports is needed. Patient reports are currently written in free-text format, which complicates the task of quality assurance. In this paper, we present a machine learning based natural language processing system for automatic quality assurance of radiology reports on breast cancer. This is achieved in three steps: we i) identify the top-level structure (headings) of the report, ii) classify the report content into the top-level headings, iii) convert the free-text detailed findings in the report to a semi-structured format (post-structuring). Top level structure and content of report were predicted with an F1 score of 0.97 and 0.94, respectively using Support Vector Machine (SVM) classifiers. For automatic structuring, our proposed hierarchical Conditional Random Field (CRF) outperformed the baseline CRF with an F1 score of 0.78 vs 0.71. The determined structure of the report is represented in semi-structured XML format of the free-text report, which helps to easily visualize the conformance of the findings to the protocols. This format also allows easy extraction of specific information for other purposes such as search, evaluation and research.
Original languageEnglish
JournalIEEE/ACM Transactions on Computational Biology and Bioinformatics
DOIs
Publication statusE-pub ahead of print/First online - 3 May 2019

Fingerprint

Radiology
Quality Assurance
Quality assurance
Breast Cancer
Conditional Random Fields
Natural Language Processing
Breast Neoplasms
Network protocols
Information Storage and Retrieval
Natural language processing systems
XML
Natural Language
Convert
Support vector machines
Learning systems
Well-defined
Baseline
Support Vector Machine
Machine Learning
Classifiers

Cite this

@article{c01100c917b9455d9e8dd7c02724c2c5,
title = "Post-Structuring Radiology Reports of Breast Cancer Patients for Clinical Quality Assurance",
abstract = "Hospitals often set protocols based on well defined standards to maintain the quality of patient reports. To ensure that the clinicians conform to the protocols, quality assurance of these reports is needed. Patient reports are currently written in free-text format, which complicates the task of quality assurance. In this paper, we present a machine learning based natural language processing system for automatic quality assurance of radiology reports on breast cancer. This is achieved in three steps: we i) identify the top-level structure (headings) of the report, ii) classify the report content into the top-level headings, iii) convert the free-text detailed findings in the report to a semi-structured format (post-structuring). Top level structure and content of report were predicted with an F1 score of 0.97 and 0.94, respectively using Support Vector Machine (SVM) classifiers. For automatic structuring, our proposed hierarchical Conditional Random Field (CRF) outperformed the baseline CRF with an F1 score of 0.78 vs 0.71. The determined structure of the report is represented in semi-structured XML format of the free-text report, which helps to easily visualize the conformance of the findings to the protocols. This format also allows easy extraction of specific information for other purposes such as search, evaluation and research.",
author = "Shreyasi Pathak and {van Rossen}, Jorit and Onno Vijlbrief and Jeroen Geerdink and Christin Seifert and {van Keulen}, Maurice",
year = "2019",
month = "5",
day = "3",
doi = "10.1109/TCBB.2019.2914678",
language = "English",
journal = "IEEE/ACM Transactions on Computational Biology and Bioinformatics",
issn = "1545-5963",
publisher = "IEEE",

}

Post-Structuring Radiology Reports of Breast Cancer Patients for Clinical Quality Assurance. / Pathak, Shreyasi ; van Rossen, Jorit; Vijlbrief, Onno; Geerdink, Jeroen; Seifert, Christin ; van Keulen, Maurice .

In: IEEE/ACM Transactions on Computational Biology and Bioinformatics, 03.05.2019.

Research output: Contribution to journalArticleAcademicpeer-review

TY - JOUR

T1 - Post-Structuring Radiology Reports of Breast Cancer Patients for Clinical Quality Assurance

AU - Pathak, Shreyasi

AU - van Rossen, Jorit

AU - Vijlbrief, Onno

AU - Geerdink, Jeroen

AU - Seifert, Christin

AU - van Keulen, Maurice

PY - 2019/5/3

Y1 - 2019/5/3

N2 - Hospitals often set protocols based on well defined standards to maintain the quality of patient reports. To ensure that the clinicians conform to the protocols, quality assurance of these reports is needed. Patient reports are currently written in free-text format, which complicates the task of quality assurance. In this paper, we present a machine learning based natural language processing system for automatic quality assurance of radiology reports on breast cancer. This is achieved in three steps: we i) identify the top-level structure (headings) of the report, ii) classify the report content into the top-level headings, iii) convert the free-text detailed findings in the report to a semi-structured format (post-structuring). Top level structure and content of report were predicted with an F1 score of 0.97 and 0.94, respectively using Support Vector Machine (SVM) classifiers. For automatic structuring, our proposed hierarchical Conditional Random Field (CRF) outperformed the baseline CRF with an F1 score of 0.78 vs 0.71. The determined structure of the report is represented in semi-structured XML format of the free-text report, which helps to easily visualize the conformance of the findings to the protocols. This format also allows easy extraction of specific information for other purposes such as search, evaluation and research.

AB - Hospitals often set protocols based on well defined standards to maintain the quality of patient reports. To ensure that the clinicians conform to the protocols, quality assurance of these reports is needed. Patient reports are currently written in free-text format, which complicates the task of quality assurance. In this paper, we present a machine learning based natural language processing system for automatic quality assurance of radiology reports on breast cancer. This is achieved in three steps: we i) identify the top-level structure (headings) of the report, ii) classify the report content into the top-level headings, iii) convert the free-text detailed findings in the report to a semi-structured format (post-structuring). Top level structure and content of report were predicted with an F1 score of 0.97 and 0.94, respectively using Support Vector Machine (SVM) classifiers. For automatic structuring, our proposed hierarchical Conditional Random Field (CRF) outperformed the baseline CRF with an F1 score of 0.78 vs 0.71. The determined structure of the report is represented in semi-structured XML format of the free-text report, which helps to easily visualize the conformance of the findings to the protocols. This format also allows easy extraction of specific information for other purposes such as search, evaluation and research.

U2 - 10.1109/TCBB.2019.2914678

DO - 10.1109/TCBB.2019.2914678

M3 - Article

JO - IEEE/ACM Transactions on Computational Biology and Bioinformatics

JF - IEEE/ACM Transactions on Computational Biology and Bioinformatics

SN - 1545-5963

ER -