Online Clustering for Novelty Detection and Concept Drift in Data Streams

Kemilly Dearo Garcia*, Mannes Poel, Joost N. Kok, André C.P.L.F. de Carvalho

*Corresponding author for this work

    Research output: Chapter in Book/Report/Conference proceedingConference contributionAcademicpeer-review

    7 Citations (Scopus)
    31 Downloads (Pure)

    Abstract

    Data streams are related to large amounts of data that can continuously arrive with a probability distribution that may change over time. Depending on the changes in the data distribution, different phenomena can occur, like new classes can appear or concept drift can occur in existing classes. Machine Learning algorithms have been often used to model this data. New classes are patterns that were not seen during the training of the current classification model, but appear after some time. Concept drift occurs when the concepts associated with a dataset change as new data arrive. This paper proposes a new algorithm based on kNN that uses micro-clusters as prototypes and incrementally updates the micro-clusters or creates new micro-clusters when novelties are detected. In the online phase, each instance close to a micro-cluster is considered an extension of the micro-cluster, being used to adapt the model to concept drift. The proposed algorithm is experimentally compared with a state-of-the-art classifier from the data stream literature and one baseline. According to the experimental results, the proposed algorithm increases the predictive performance over time by incrementally learning changes in the data distribution.

    Original languageEnglish
    Title of host publicationProgress in Artificial Intelligence
    Subtitle of host publication19th EPIA Conference on Artificial Intelligence, EPIA 2019, Vila Real, Portugal, September 3–6, 2019, Proceedings
    EditorsPaulo Moura Oliveira, Paulo Novais, Luís Paulo Reis
    Place of PublicationCham
    PublisherSpringer
    Pages448-459
    Number of pages12
    ISBN (Electronic)978-3-030-30244-3
    ISBN (Print)978-3-030-30243-6
    DOIs
    Publication statusPublished - 1 Jan 2019
    Event19th EPIA Conference on Artificial Intelligence, EPIA 2019 - Vila Real, Portugal
    Duration: 3 Sept 20196 Sept 2019
    Conference number: 19

    Publication series

    NameLecture Notes in Computer Science
    PublisherSpringer
    Volume11805
    ISSN (Print)0302-9743
    ISSN (Electronic)1611-3349
    NameLecture Notes in Artificial Intelligence
    PublisherSpringer
    NameLecture Notes in Bioinformatics
    PublisherSpringer

    Conference

    Conference19th EPIA Conference on Artificial Intelligence, EPIA 2019
    Abbreviated titleEPIA
    Country/TerritoryPortugal
    CityVila Real
    Period3/09/196/09/19

    Keywords

    • Concept drift
    • Data stream
    • Novelty detection
    • Online learning

    Fingerprint

    Dive into the research topics of 'Online Clustering for Novelty Detection and Concept Drift in Data Streams'. Together they form a unique fingerprint.

    Cite this