Abstract
We use Common Crawl's 25TB data set of web pages to construct a database of associated concepts using Hadoop. The database can be queried through a web application with two query interfaces. A textual interface allows searching for similarities and differences between multiple concepts using a query language similar to set notation, and a graphical interface allows users to visualize similarity relationships of concepts in a force directed graph.
Original language | Undefined |
---|---|
Title of host publication | Proceedings of the 13th Dutch-Belgian Workshop on Information Retrieval, DIR 2013 |
Place of Publication | Aachen, Germany |
Publisher | CEUR |
Pages | 56-57 |
Number of pages | 2 |
Publication status | Published - Apr 2013 |
Event | 13th Dutch-Belgian Information Retrieval Workshop, DIR 2013 - Delft, Netherlands Duration: 26 Apr 2013 → 26 Apr 2013 Conference number: 13 |
Publication series
Name | CEUR Workshop Proceedings |
---|---|
Publisher | CEUR |
Volume | 986 |
ISSN (Print) | 1613-0073 |
Workshop
Workshop | 13th Dutch-Belgian Information Retrieval Workshop, DIR 2013 |
---|---|
Abbreviated title | DIR |
Country/Territory | Netherlands |
City | Delft |
Period | 26/04/13 → 26/04/13 |
Keywords
- EWI-23832
- CR-H.3.1
- CR-H.3.3
- METIS-300084
- Question Answering
- IR-88328
- MapReduce
- Information Extraction