Entropy-based discretization methods for ranking data

Cláudio Rebelo De Sá, Carlos Soares, Arno Knobbe

    Research output: Contribution to journalArticleAcademicpeer-review

    19 Citations (Scopus)

    Abstract

    Label Ranking (LR) problems are becoming increasingly important in Machine Learning. While there has been a significant amount of work on the development of learning algorithms for LR in recent years, there are not many pre-processing methods for LR. Some methods, like Naive Bayes for LR and APRIORI-LR, cannot handle real-valued data directly. Conventional discretization methods used in classification are not suitable for LR problems, due to the different target variable. In this work, we make an extensive analysis of the existing methods using simple approaches. We also propose a new method called EDiRa (Entropy-based Discretization for Ranking) for the discretization of ranking data. We illustrate the advantages of the method using synthetic data and also on several benchmark datasets. The results clearly indicate that the discretization is performing as expected and also improves the results and efficiency of the learning algorithms.

    Original languageEnglish
    Pages (from-to)921-936
    Number of pages16
    JournalInformation sciences
    Volume329
    DOIs
    Publication statusPublished - 1 Feb 2016

    Fingerprint

    Discretization Method
    Labels
    Ranking
    Entropy
    Learning algorithms
    Discretization
    Learning Algorithm
    Entropy Method
    Naive Bayes
    Learning systems
    Synthetic Data
    Preprocessing
    Machine Learning
    Benchmark
    Processing
    Target

    Keywords

    • Association Rule Mining
    • Discretization
    • Label ranking
    • Minimum description length

    Cite this

    De Sá, Cláudio Rebelo ; Soares, Carlos ; Knobbe, Arno. / Entropy-based discretization methods for ranking data. In: Information sciences. 2016 ; Vol. 329. pp. 921-936.
    @article{8ca0655153344d70bd03e05c19359b98,
    title = "Entropy-based discretization methods for ranking data",
    abstract = "Label Ranking (LR) problems are becoming increasingly important in Machine Learning. While there has been a significant amount of work on the development of learning algorithms for LR in recent years, there are not many pre-processing methods for LR. Some methods, like Naive Bayes for LR and APRIORI-LR, cannot handle real-valued data directly. Conventional discretization methods used in classification are not suitable for LR problems, due to the different target variable. In this work, we make an extensive analysis of the existing methods using simple approaches. We also propose a new method called EDiRa (Entropy-based Discretization for Ranking) for the discretization of ranking data. We illustrate the advantages of the method using synthetic data and also on several benchmark datasets. The results clearly indicate that the discretization is performing as expected and also improves the results and efficiency of the learning algorithms.",
    keywords = "Association Rule Mining, Discretization, Label ranking, Minimum description length",
    author = "{De S{\'a}}, {Cl{\'a}udio Rebelo} and Carlos Soares and Arno Knobbe",
    year = "2016",
    month = "2",
    day = "1",
    doi = "10.1016/j.ins.2015.04.022",
    language = "English",
    volume = "329",
    pages = "921--936",
    journal = "Information sciences",
    issn = "0020-0255",
    publisher = "Elsevier",

    }

    Entropy-based discretization methods for ranking data. / De Sá, Cláudio Rebelo; Soares, Carlos; Knobbe, Arno.

    In: Information sciences, Vol. 329, 01.02.2016, p. 921-936.

    Research output: Contribution to journalArticleAcademicpeer-review

    TY - JOUR

    T1 - Entropy-based discretization methods for ranking data

    AU - De Sá, Cláudio Rebelo

    AU - Soares, Carlos

    AU - Knobbe, Arno

    PY - 2016/2/1

    Y1 - 2016/2/1

    N2 - Label Ranking (LR) problems are becoming increasingly important in Machine Learning. While there has been a significant amount of work on the development of learning algorithms for LR in recent years, there are not many pre-processing methods for LR. Some methods, like Naive Bayes for LR and APRIORI-LR, cannot handle real-valued data directly. Conventional discretization methods used in classification are not suitable for LR problems, due to the different target variable. In this work, we make an extensive analysis of the existing methods using simple approaches. We also propose a new method called EDiRa (Entropy-based Discretization for Ranking) for the discretization of ranking data. We illustrate the advantages of the method using synthetic data and also on several benchmark datasets. The results clearly indicate that the discretization is performing as expected and also improves the results and efficiency of the learning algorithms.

    AB - Label Ranking (LR) problems are becoming increasingly important in Machine Learning. While there has been a significant amount of work on the development of learning algorithms for LR in recent years, there are not many pre-processing methods for LR. Some methods, like Naive Bayes for LR and APRIORI-LR, cannot handle real-valued data directly. Conventional discretization methods used in classification are not suitable for LR problems, due to the different target variable. In this work, we make an extensive analysis of the existing methods using simple approaches. We also propose a new method called EDiRa (Entropy-based Discretization for Ranking) for the discretization of ranking data. We illustrate the advantages of the method using synthetic data and also on several benchmark datasets. The results clearly indicate that the discretization is performing as expected and also improves the results and efficiency of the learning algorithms.

    KW - Association Rule Mining

    KW - Discretization

    KW - Label ranking

    KW - Minimum description length

    UR - http://www.scopus.com/inward/record.url?scp=84949731124&partnerID=8YFLogxK

    U2 - 10.1016/j.ins.2015.04.022

    DO - 10.1016/j.ins.2015.04.022

    M3 - Article

    VL - 329

    SP - 921

    EP - 936

    JO - Information sciences

    JF - Information sciences

    SN - 0020-0255

    ER -