Biomedical text mining: State-of-the-art, open problems and future challenges

Andreas Holzinger, Johannes Schantl, Miriam Schroettner, Christin Seifert, Karin Verspoor

    Research output: Chapter in Book/Report/Conference proceedingChapterAcademicpeer-review

    54 Citations (Scopus)


    Text is a very important type of data within the biomedical domain. For example, patient records contain large amounts of text which has been entered in a non-standardized format, consequently posing a lot of challenges to processing of such data. For the clinical doctor the written text in the medical findings is still the basis for decision making-neither images nor multimedia data. However, the steadily increasing volumes of unstructured information need machine learning approaches for data mining, i.e. text mining. This paper provides a short, concise overview of some selected text mining methods, focusing on statistical methods, i.e. Latent Semantic Analysis, Probabilistic Latent Semantic Analysis, Latent Dirichlet Allocation, Hierarchical Latent Dirichlet Allocation, Principal Component Analysis, and Support Vector Machines, along with some examples from the biomedical domain. Finally, we provide some open problems and future challenges, particularly from the clinical domain, that we expect to stimulate future research.
    Original languageEnglish
    Title of host publicationInteractive Knowledge Discovery and Data Mining in Biomedical Informatics
    Subtitle of host publicationState-of-the-Art and Future Challenges
    EditorsAndreas Holzinger, Igor Jurisica
    Number of pages30
    ISBN (Electronic)978-3-662-43968-5
    ISBN (Print)978-3-662-43967-8
    Publication statusPublished - 2014

    Publication series

    NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)


    • Big data
    • Knowledge discovery
    • LDA
    • LSA
    • Natural language processing
    • PCA
    • PLSA
    • SVM
    • Statistical models
    • Text classification
    • Text mining
    • Unstructured information
    • hLDA


    Dive into the research topics of 'Biomedical text mining: State-of-the-art, open problems and future challenges'. Together they form a unique fingerprint.

    Cite this