Hierarchical Topic Detection in Large Digital News Archives: Exploring a Sample Based Approach

Dolf Trieschnigg, Wessel Kraaij

    Research output: Contribution to journalArticleAcademicpeer-review

    8 Downloads (Pure)


    Hierarchical topic detection is a new task in the TDT 2004 evaluation program, which aims to organize a collection of unstructured news data in a directed acyclic graph (DAG) structure, refecting the topics discussed in the collection, ranging from rather coarse category like nodes to file singular events. The HTD task poses interesting challenges since its evaluation metric is composed of a travel cost component refecting the time to fhd the node of interest starting from the top node and a quality cost component, determined by the quality of the selected node. We present a scalable architecture for HTD and compare several alternative choices for agglomerative clustering and DAG optimization in order to minimize the HTD cost metric. The alternatives are evaluated on the TDT3 and TDT5 test collections.
    Original languageEnglish
    Pages (from-to)21-27
    Number of pages7
    JournalJournal of digital information management
    Issue number1
    Publication statusPublished - 2005
    Event5th Dutch-Belgian Information Retrieval Workshop, DIR 2005 - Utrecht University, Utrecht, Netherlands
    Duration: 10 Jan 200511 Jan 2005
    Conference number: 5


    • METIS-227310
    • EWI-1803
    • Information retrieval
    • Hierarchical topic detection
    • TDT


    Dive into the research topics of 'Hierarchical Topic Detection in Large Digital News Archives: Exploring a Sample Based Approach'. Together they form a unique fingerprint.

    Cite this