Grouping by association: using associative networks for document categorization

Niels Bloom

Research output: ThesisPhD Thesis - Research UT, graduation UT

Abstract

In this thesis we describe a method of using associative networks for automatic doc- ument grouping. Associative networks are networks of ideas or concepts in which each concept is linked to concepts that are semantically similar to it. By activating concepts in the network based on the text of a document and spreading this activation to related con- cepts, we can determine which concepts are related to the document, even if the document itself does not contain words linked directly to those concepts. Based on this information, we can group documents by the concepts they refer to. In the first part of the thesis we describe the method itself, as well as the details of various algorithms used in the implementation. We additionally discuss the theory upon which the method is based and compare it to various related methods. In the second part of the thesis we evaluate techniques to create associative networks from easily accessible knowledge sources, as well as different methods for the training of the associative network. Additionally, we evaluate techniques to improve the extraction of concepts from documents, we compare methods of spreading activation from concept to concept, and we present a novel technique by which the extracted concepts can be used to categorize documents. We also extend the method of associative networks to enable application to multilingual document libraries and compare the method to other state-of- the-art methods for document grouping. Finally, we present a practical application of associative networks, as implemented in a corporate environment in the form of the Pagelink Knowledge Centre. We demonstrate the practical usability of our work, and discuss the various advantages and disadvantages that the method of associative networks offers.
LanguageUndefined
Awarding Institution
  • University of Twente
Supervisors/Advisors
  • de Jong, Franciska M.G., Supervisor
  • Theune, Mariet , Advisor
Thesis sponsors
Award date10 Jun 2015
Place of PublicationEnschede
Publisher
Print ISBNs978-90-365-3878-7
DOIs
StatePublished - 10 Jun 2015

Keywords

  • EWI-26945
  • IR-96192
  • METIS-310797

Cite this

Bloom, N. (2015). Grouping by association: using associative networks for document categorization Enschede: Centre for Telematics and Information Technology (CTIT) DOI: 10.3990/1.9789036538787
Bloom, Niels. / Grouping by association: using associative networks for document categorization. Enschede : Centre for Telematics and Information Technology (CTIT), 2015. 199 p.
@phdthesis{4902f0948ea742009b9ec1a6689a8272,
title = "Grouping by association: using associative networks for document categorization",
abstract = "In this thesis we describe a method of using associative networks for automatic doc- ument grouping. Associative networks are networks of ideas or concepts in which each concept is linked to concepts that are semantically similar to it. By activating concepts in the network based on the text of a document and spreading this activation to related con- cepts, we can determine which concepts are related to the document, even if the document itself does not contain words linked directly to those concepts. Based on this information, we can group documents by the concepts they refer to. In the first part of the thesis we describe the method itself, as well as the details of various algorithms used in the implementation. We additionally discuss the theory upon which the method is based and compare it to various related methods. In the second part of the thesis we evaluate techniques to create associative networks from easily accessible knowledge sources, as well as different methods for the training of the associative network. Additionally, we evaluate techniques to improve the extraction of concepts from documents, we compare methods of spreading activation from concept to concept, and we present a novel technique by which the extracted concepts can be used to categorize documents. We also extend the method of associative networks to enable application to multilingual document libraries and compare the method to other state-of- the-art methods for document grouping. Finally, we present a practical application of associative networks, as implemented in a corporate environment in the form of the Pagelink Knowledge Centre. We demonstrate the practical usability of our work, and discuss the various advantages and disadvantages that the method of associative networks offers.",
keywords = "EWI-26945, IR-96192, METIS-310797",
author = "Niels Bloom",
year = "2015",
month = "6",
day = "10",
doi = "10.3990/1.9789036538787",
language = "Undefined",
isbn = "978-90-365-3878-7",
publisher = "Centre for Telematics and Information Technology (CTIT)",
address = "Netherlands",
school = "University of Twente",

}

Grouping by association: using associative networks for document categorization. / Bloom, Niels.

Enschede : Centre for Telematics and Information Technology (CTIT), 2015. 199 p.

Research output: ThesisPhD Thesis - Research UT, graduation UT

TY - THES

T1 - Grouping by association: using associative networks for document categorization

AU - Bloom,Niels

PY - 2015/6/10

Y1 - 2015/6/10

N2 - In this thesis we describe a method of using associative networks for automatic doc- ument grouping. Associative networks are networks of ideas or concepts in which each concept is linked to concepts that are semantically similar to it. By activating concepts in the network based on the text of a document and spreading this activation to related con- cepts, we can determine which concepts are related to the document, even if the document itself does not contain words linked directly to those concepts. Based on this information, we can group documents by the concepts they refer to. In the first part of the thesis we describe the method itself, as well as the details of various algorithms used in the implementation. We additionally discuss the theory upon which the method is based and compare it to various related methods. In the second part of the thesis we evaluate techniques to create associative networks from easily accessible knowledge sources, as well as different methods for the training of the associative network. Additionally, we evaluate techniques to improve the extraction of concepts from documents, we compare methods of spreading activation from concept to concept, and we present a novel technique by which the extracted concepts can be used to categorize documents. We also extend the method of associative networks to enable application to multilingual document libraries and compare the method to other state-of- the-art methods for document grouping. Finally, we present a practical application of associative networks, as implemented in a corporate environment in the form of the Pagelink Knowledge Centre. We demonstrate the practical usability of our work, and discuss the various advantages and disadvantages that the method of associative networks offers.

AB - In this thesis we describe a method of using associative networks for automatic doc- ument grouping. Associative networks are networks of ideas or concepts in which each concept is linked to concepts that are semantically similar to it. By activating concepts in the network based on the text of a document and spreading this activation to related con- cepts, we can determine which concepts are related to the document, even if the document itself does not contain words linked directly to those concepts. Based on this information, we can group documents by the concepts they refer to. In the first part of the thesis we describe the method itself, as well as the details of various algorithms used in the implementation. We additionally discuss the theory upon which the method is based and compare it to various related methods. In the second part of the thesis we evaluate techniques to create associative networks from easily accessible knowledge sources, as well as different methods for the training of the associative network. Additionally, we evaluate techniques to improve the extraction of concepts from documents, we compare methods of spreading activation from concept to concept, and we present a novel technique by which the extracted concepts can be used to categorize documents. We also extend the method of associative networks to enable application to multilingual document libraries and compare the method to other state-of- the-art methods for document grouping. Finally, we present a practical application of associative networks, as implemented in a corporate environment in the form of the Pagelink Knowledge Centre. We demonstrate the practical usability of our work, and discuss the various advantages and disadvantages that the method of associative networks offers.

KW - EWI-26945

KW - IR-96192

KW - METIS-310797

U2 - 10.3990/1.9789036538787

DO - 10.3990/1.9789036538787

M3 - PhD Thesis - Research UT, graduation UT

SN - 978-90-365-3878-7

PB - Centre for Telematics and Information Technology (CTIT)

CY - Enschede

ER -

Bloom N. Grouping by association: using associative networks for document categorization. Enschede: Centre for Telematics and Information Technology (CTIT), 2015. 199 p. Available from, DOI: 10.3990/1.9789036538787