Online Clustering for Novelty Detection and Concept Drift in Data Streams

Kemilly Dearo Garcia*, Mannes Poel, Joost N. Kok, André C.P.L.F. de Carvalho

*Corresponding author for this work

    Research output: Chapter in Book/Report/Conference proceedingConference contributionAcademicpeer-review

    Abstract

    Data streams are related to large amounts of data that can continuously arrive with a probability distribution that may change over time. Depending on the changes in the data distribution, different phenomena can occur, like new classes can appear or concept drift can occur in existing classes. Machine Learning algorithms have been often used to model this data. New classes are patterns that were not seen during the training of the current classification model, but appear after some time. Concept drift occurs when the concepts associated with a dataset change as new data arrive. This paper proposes a new algorithm based on kNN that uses micro-clusters as prototypes and incrementally updates the micro-clusters or creates new micro-clusters when novelties are detected. In the online phase, each instance close to a micro-cluster is considered an extension of the micro-cluster, being used to adapt the model to concept drift. The proposed algorithm is experimentally compared with a state-of-the-art classifier from the data stream literature and one baseline. According to the experimental results, the proposed algorithm increases the predictive performance over time by incrementally learning changes in the data distribution.

    Original languageEnglish
    Title of host publicationProgress in Artificial Intelligence
    Subtitle of host publication19th EPIA Conference on Artificial Intelligence, EPIA 2019, Vila Real, Portugal, September 3–6, 2019, Proceedings
    EditorsPaulo Moura Oliveira, Paulo Novais, Luís Paulo Reis
    Place of PublicationCham
    PublisherSpringer
    Pages448-459
    Number of pages12
    ISBN (Electronic)978-3-030-30244-3
    ISBN (Print)978-3-030-30243-6
    DOIs
    Publication statusPublished - 1 Jan 2019
    Event19th EPIA Conference on Artificial Intelligence, EPIA 2019 - Vila Real, Portugal
    Duration: 3 Sep 20196 Sep 2019
    Conference number: 19

    Publication series

    NameLecture Notes in Computer Science
    PublisherSpringer
    Volume11805
    ISSN (Print)0302-9743
    ISSN (Electronic)1611-3349
    NameLecture Notes in Artificial Intelligence
    PublisherSpringer
    NameLecture Notes in Bioinformatics
    PublisherSpringer

    Conference

    Conference19th EPIA Conference on Artificial Intelligence, EPIA 2019
    Abbreviated titleEPIA
    CountryPortugal
    CityVila Real
    Period3/09/196/09/19

    Fingerprint

    Novelty Detection
    Concept Drift
    Data Streams
    Clustering
    Data Distribution
    Probability distributions
    Learning algorithms
    Learning systems
    Classifiers
    Learning Algorithm
    Baseline
    Machine Learning
    Probability Distribution
    Update
    Classifier
    Model
    Prototype
    Experimental Results
    Class

    Keywords

    • Concept drift
    • Data stream
    • Novelty detection
    • Online learning

    Cite this

    Garcia, K. D., Poel, M., Kok, J. N., & de Carvalho, A. C. P. L. F. (2019). Online Clustering for Novelty Detection and Concept Drift in Data Streams. In P. Moura Oliveira, P. Novais, & L. P. Reis (Eds.), Progress in Artificial Intelligence: 19th EPIA Conference on Artificial Intelligence, EPIA 2019, Vila Real, Portugal, September 3–6, 2019, Proceedings (pp. 448-459). (Lecture Notes in Computer Science; Vol. 11805), (Lecture Notes in Artificial Intelligence), (Lecture Notes in Bioinformatics). Cham: Springer. https://doi.org/10.1007/978-3-030-30244-3_37
    Garcia, Kemilly Dearo ; Poel, Mannes ; Kok, Joost N. ; de Carvalho, André C.P.L.F. / Online Clustering for Novelty Detection and Concept Drift in Data Streams. Progress in Artificial Intelligence: 19th EPIA Conference on Artificial Intelligence, EPIA 2019, Vila Real, Portugal, September 3–6, 2019, Proceedings. editor / Paulo Moura Oliveira ; Paulo Novais ; Luís Paulo Reis. Cham : Springer, 2019. pp. 448-459 (Lecture Notes in Computer Science). (Lecture Notes in Artificial Intelligence). (Lecture Notes in Bioinformatics).
    @inproceedings{6b3342c533514aeea0e33b81806d8aa6,
    title = "Online Clustering for Novelty Detection and Concept Drift in Data Streams",
    abstract = "Data streams are related to large amounts of data that can continuously arrive with a probability distribution that may change over time. Depending on the changes in the data distribution, different phenomena can occur, like new classes can appear or concept drift can occur in existing classes. Machine Learning algorithms have been often used to model this data. New classes are patterns that were not seen during the training of the current classification model, but appear after some time. Concept drift occurs when the concepts associated with a dataset change as new data arrive. This paper proposes a new algorithm based on kNN that uses micro-clusters as prototypes and incrementally updates the micro-clusters or creates new micro-clusters when novelties are detected. In the online phase, each instance close to a micro-cluster is considered an extension of the micro-cluster, being used to adapt the model to concept drift. The proposed algorithm is experimentally compared with a state-of-the-art classifier from the data stream literature and one baseline. According to the experimental results, the proposed algorithm increases the predictive performance over time by incrementally learning changes in the data distribution.",
    keywords = "Concept drift, Data stream, Novelty detection, Online learning",
    author = "Garcia, {Kemilly Dearo} and Mannes Poel and Kok, {Joost N.} and {de Carvalho}, {Andr{\'e} C.P.L.F.}",
    year = "2019",
    month = "1",
    day = "1",
    doi = "10.1007/978-3-030-30244-3_37",
    language = "English",
    isbn = "978-3-030-30243-6",
    series = "Lecture Notes in Computer Science",
    publisher = "Springer",
    pages = "448--459",
    editor = "{Moura Oliveira}, Paulo and Paulo Novais and Reis, {Lu{\'i}s Paulo}",
    booktitle = "Progress in Artificial Intelligence",

    }

    Garcia, KD, Poel, M, Kok, JN & de Carvalho, ACPLF 2019, Online Clustering for Novelty Detection and Concept Drift in Data Streams. in P Moura Oliveira, P Novais & LP Reis (eds), Progress in Artificial Intelligence: 19th EPIA Conference on Artificial Intelligence, EPIA 2019, Vila Real, Portugal, September 3–6, 2019, Proceedings. Lecture Notes in Computer Science, vol. 11805, Lecture Notes in Artificial Intelligence, Lecture Notes in Bioinformatics, Springer, Cham, pp. 448-459, 19th EPIA Conference on Artificial Intelligence, EPIA 2019, Vila Real, Portugal, 3/09/19. https://doi.org/10.1007/978-3-030-30244-3_37

    Online Clustering for Novelty Detection and Concept Drift in Data Streams. / Garcia, Kemilly Dearo; Poel, Mannes; Kok, Joost N.; de Carvalho, André C.P.L.F.

    Progress in Artificial Intelligence: 19th EPIA Conference on Artificial Intelligence, EPIA 2019, Vila Real, Portugal, September 3–6, 2019, Proceedings. ed. / Paulo Moura Oliveira; Paulo Novais; Luís Paulo Reis. Cham : Springer, 2019. p. 448-459 (Lecture Notes in Computer Science; Vol. 11805), (Lecture Notes in Artificial Intelligence), (Lecture Notes in Bioinformatics).

    Research output: Chapter in Book/Report/Conference proceedingConference contributionAcademicpeer-review

    TY - GEN

    T1 - Online Clustering for Novelty Detection and Concept Drift in Data Streams

    AU - Garcia, Kemilly Dearo

    AU - Poel, Mannes

    AU - Kok, Joost N.

    AU - de Carvalho, André C.P.L.F.

    PY - 2019/1/1

    Y1 - 2019/1/1

    N2 - Data streams are related to large amounts of data that can continuously arrive with a probability distribution that may change over time. Depending on the changes in the data distribution, different phenomena can occur, like new classes can appear or concept drift can occur in existing classes. Machine Learning algorithms have been often used to model this data. New classes are patterns that were not seen during the training of the current classification model, but appear after some time. Concept drift occurs when the concepts associated with a dataset change as new data arrive. This paper proposes a new algorithm based on kNN that uses micro-clusters as prototypes and incrementally updates the micro-clusters or creates new micro-clusters when novelties are detected. In the online phase, each instance close to a micro-cluster is considered an extension of the micro-cluster, being used to adapt the model to concept drift. The proposed algorithm is experimentally compared with a state-of-the-art classifier from the data stream literature and one baseline. According to the experimental results, the proposed algorithm increases the predictive performance over time by incrementally learning changes in the data distribution.

    AB - Data streams are related to large amounts of data that can continuously arrive with a probability distribution that may change over time. Depending on the changes in the data distribution, different phenomena can occur, like new classes can appear or concept drift can occur in existing classes. Machine Learning algorithms have been often used to model this data. New classes are patterns that were not seen during the training of the current classification model, but appear after some time. Concept drift occurs when the concepts associated with a dataset change as new data arrive. This paper proposes a new algorithm based on kNN that uses micro-clusters as prototypes and incrementally updates the micro-clusters or creates new micro-clusters when novelties are detected. In the online phase, each instance close to a micro-cluster is considered an extension of the micro-cluster, being used to adapt the model to concept drift. The proposed algorithm is experimentally compared with a state-of-the-art classifier from the data stream literature and one baseline. According to the experimental results, the proposed algorithm increases the predictive performance over time by incrementally learning changes in the data distribution.

    KW - Concept drift

    KW - Data stream

    KW - Novelty detection

    KW - Online learning

    UR - http://www.scopus.com/inward/record.url?scp=85072863596&partnerID=8YFLogxK

    U2 - 10.1007/978-3-030-30244-3_37

    DO - 10.1007/978-3-030-30244-3_37

    M3 - Conference contribution

    AN - SCOPUS:85072863596

    SN - 978-3-030-30243-6

    T3 - Lecture Notes in Computer Science

    SP - 448

    EP - 459

    BT - Progress in Artificial Intelligence

    A2 - Moura Oliveira, Paulo

    A2 - Novais, Paulo

    A2 - Reis, Luís Paulo

    PB - Springer

    CY - Cham

    ER -

    Garcia KD, Poel M, Kok JN, de Carvalho ACPLF. Online Clustering for Novelty Detection and Concept Drift in Data Streams. In Moura Oliveira P, Novais P, Reis LP, editors, Progress in Artificial Intelligence: 19th EPIA Conference on Artificial Intelligence, EPIA 2019, Vila Real, Portugal, September 3–6, 2019, Proceedings. Cham: Springer. 2019. p. 448-459. (Lecture Notes in Computer Science). (Lecture Notes in Artificial Intelligence). (Lecture Notes in Bioinformatics). https://doi.org/10.1007/978-3-030-30244-3_37