Biomedical text mining: State-of-the-art, open problems and future challenges

Andreas Holzinger, Johannes Schantl, Miriam Schroettner, Christin Seifert, Karin Verspoor

    Research output: Chapter in Book/Report/Conference proceedingChapterAcademicpeer-review

    40 Citations (Scopus)

    Abstract

    Text is a very important type of data within the biomedical domain. For example, patient records contain large amounts of text which has been entered in a non-standardized format, consequently posing a lot of challenges to processing of such data. For the clinical doctor the written text in the medical findings is still the basis for decision making-neither images nor multimedia data. However, the steadily increasing volumes of unstructured information need machine learning approaches for data mining, i.e. text mining. This paper provides a short, concise overview of some selected text mining methods, focusing on statistical methods, i.e. Latent Semantic Analysis, Probabilistic Latent Semantic Analysis, Latent Dirichlet Allocation, Hierarchical Latent Dirichlet Allocation, Principal Component Analysis, and Support Vector Machines, along with some examples from the biomedical domain. Finally, we provide some open problems and future challenges, particularly from the clinical domain, that we expect to stimulate future research.
    Original languageEnglish
    Title of host publicationInteractive Knowledge Discovery and Data Mining in Biomedical Informatics
    Subtitle of host publicationState-of-the-Art and Future Challenges
    EditorsAndreas Holzinger, Igor Jurisica
    PublisherSpringer
    Pages271-300
    Number of pages30
    ISBN (Electronic)978-3-662-43968-5
    ISBN (Print)978-3-662-43967-8
    DOIs
    Publication statusPublished - 2014

    Publication series

    NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
    Volume8401

    Fingerprint

    Semantics
    Principal component analysis
    Support vector machines
    Data mining
    Learning systems
    Statistical methods
    Decision making
    Processing

    Keywords

    • Big data
    • Knowledge discovery
    • LDA
    • LSA
    • Natural language processing
    • PCA
    • PLSA
    • SVM
    • Statistical models
    • Text classification
    • Text mining
    • Unstructured information
    • hLDA

    Cite this

    Holzinger, A., Schantl, J., Schroettner, M., Seifert, C., & Verspoor, K. (2014). Biomedical text mining: State-of-the-art, open problems and future challenges. In A. Holzinger, & I. Jurisica (Eds.), Interactive Knowledge Discovery and Data Mining in Biomedical Informatics: State-of-the-Art and Future Challenges (pp. 271-300). (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 8401). Springer. https://doi.org/10.1007/978-3-662-43968-5_16
    Holzinger, Andreas ; Schantl, Johannes ; Schroettner, Miriam ; Seifert, Christin ; Verspoor, Karin. / Biomedical text mining: State-of-the-art, open problems and future challenges. Interactive Knowledge Discovery and Data Mining in Biomedical Informatics: State-of-the-Art and Future Challenges. editor / Andreas Holzinger ; Igor Jurisica. Springer, 2014. pp. 271-300 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)).
    @inbook{c91ed76a78684137b40790b0b432014d,
    title = "Biomedical text mining: State-of-the-art, open problems and future challenges",
    abstract = "Text is a very important type of data within the biomedical domain. For example, patient records contain large amounts of text which has been entered in a non-standardized format, consequently posing a lot of challenges to processing of such data. For the clinical doctor the written text in the medical findings is still the basis for decision making-neither images nor multimedia data. However, the steadily increasing volumes of unstructured information need machine learning approaches for data mining, i.e. text mining. This paper provides a short, concise overview of some selected text mining methods, focusing on statistical methods, i.e. Latent Semantic Analysis, Probabilistic Latent Semantic Analysis, Latent Dirichlet Allocation, Hierarchical Latent Dirichlet Allocation, Principal Component Analysis, and Support Vector Machines, along with some examples from the biomedical domain. Finally, we provide some open problems and future challenges, particularly from the clinical domain, that we expect to stimulate future research.",
    keywords = "Big data, Knowledge discovery, LDA, LSA, Natural language processing, PCA, PLSA, SVM, Statistical models, Text classification, Text mining, Unstructured information, hLDA",
    author = "Andreas Holzinger and Johannes Schantl and Miriam Schroettner and Christin Seifert and Karin Verspoor",
    year = "2014",
    doi = "10.1007/978-3-662-43968-5_16",
    language = "English",
    isbn = "978-3-662-43967-8",
    series = "Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)",
    publisher = "Springer",
    pages = "271--300",
    editor = "Andreas Holzinger and Igor Jurisica",
    booktitle = "Interactive Knowledge Discovery and Data Mining in Biomedical Informatics",

    }

    Holzinger, A, Schantl, J, Schroettner, M, Seifert, C & Verspoor, K 2014, Biomedical text mining: State-of-the-art, open problems and future challenges. in A Holzinger & I Jurisica (eds), Interactive Knowledge Discovery and Data Mining in Biomedical Informatics: State-of-the-Art and Future Challenges. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 8401, Springer, pp. 271-300. https://doi.org/10.1007/978-3-662-43968-5_16

    Biomedical text mining: State-of-the-art, open problems and future challenges. / Holzinger, Andreas; Schantl, Johannes; Schroettner, Miriam; Seifert, Christin; Verspoor, Karin.

    Interactive Knowledge Discovery and Data Mining in Biomedical Informatics: State-of-the-Art and Future Challenges. ed. / Andreas Holzinger; Igor Jurisica. Springer, 2014. p. 271-300 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 8401).

    Research output: Chapter in Book/Report/Conference proceedingChapterAcademicpeer-review

    TY - CHAP

    T1 - Biomedical text mining: State-of-the-art, open problems and future challenges

    AU - Holzinger, Andreas

    AU - Schantl, Johannes

    AU - Schroettner, Miriam

    AU - Seifert, Christin

    AU - Verspoor, Karin

    PY - 2014

    Y1 - 2014

    N2 - Text is a very important type of data within the biomedical domain. For example, patient records contain large amounts of text which has been entered in a non-standardized format, consequently posing a lot of challenges to processing of such data. For the clinical doctor the written text in the medical findings is still the basis for decision making-neither images nor multimedia data. However, the steadily increasing volumes of unstructured information need machine learning approaches for data mining, i.e. text mining. This paper provides a short, concise overview of some selected text mining methods, focusing on statistical methods, i.e. Latent Semantic Analysis, Probabilistic Latent Semantic Analysis, Latent Dirichlet Allocation, Hierarchical Latent Dirichlet Allocation, Principal Component Analysis, and Support Vector Machines, along with some examples from the biomedical domain. Finally, we provide some open problems and future challenges, particularly from the clinical domain, that we expect to stimulate future research.

    AB - Text is a very important type of data within the biomedical domain. For example, patient records contain large amounts of text which has been entered in a non-standardized format, consequently posing a lot of challenges to processing of such data. For the clinical doctor the written text in the medical findings is still the basis for decision making-neither images nor multimedia data. However, the steadily increasing volumes of unstructured information need machine learning approaches for data mining, i.e. text mining. This paper provides a short, concise overview of some selected text mining methods, focusing on statistical methods, i.e. Latent Semantic Analysis, Probabilistic Latent Semantic Analysis, Latent Dirichlet Allocation, Hierarchical Latent Dirichlet Allocation, Principal Component Analysis, and Support Vector Machines, along with some examples from the biomedical domain. Finally, we provide some open problems and future challenges, particularly from the clinical domain, that we expect to stimulate future research.

    KW - Big data

    KW - Knowledge discovery

    KW - LDA

    KW - LSA

    KW - Natural language processing

    KW - PCA

    KW - PLSA

    KW - SVM

    KW - Statistical models

    KW - Text classification

    KW - Text mining

    KW - Unstructured information

    KW - hLDA

    U2 - 10.1007/978-3-662-43968-5_16

    DO - 10.1007/978-3-662-43968-5_16

    M3 - Chapter

    SN - 978-3-662-43967-8

    T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

    SP - 271

    EP - 300

    BT - Interactive Knowledge Discovery and Data Mining in Biomedical Informatics

    A2 - Holzinger, Andreas

    A2 - Jurisica, Igor

    PB - Springer

    ER -

    Holzinger A, Schantl J, Schroettner M, Seifert C, Verspoor K. Biomedical text mining: State-of-the-art, open problems and future challenges. In Holzinger A, Jurisica I, editors, Interactive Knowledge Discovery and Data Mining in Biomedical Informatics: State-of-the-Art and Future Challenges. Springer. 2014. p. 271-300. (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)). https://doi.org/10.1007/978-3-662-43968-5_16