Abstract
Keywords: Deep learning · Digital heritage · Natural history
Biodiversity heritage
Language | English |
---|---|
Title of host publication | Digital Cultural Heritage |
Editors | Marinos Ioannides |
Publisher | Springer |
Pages | 155-166 |
Number of pages | 12 |
ISBN (Electronic) | 978-3-319-75826-8 |
ISBN (Print) | 978-3-319-75825-1 |
DOIs | |
Publication status | Published - Mar 2018 |
Publication series
Name | Lecture Notes in Computer Science LNCS |
---|---|
Publisher | Springer |
Volume | 10605 |
Fingerprint
Keywords
- deep learning
- Digital Heritage
- Natural History
- Biodiversity heritage
- Digital Humanities
Cite this
}
Towards a Digital Infrastructure for Illustrated Handwritten Archives. / Weber, Andreas ; Ameryan, Mahya; Wolstencroft, Katherine; Stork, Lise; Heerlien, Maarten ; Schomaker, Lambert.
Digital Cultural Heritage. ed. / Marinos Ioannides. Springer, 2018. p. 155-166 (Lecture Notes in Computer Science LNCS; Vol. 10605).Research output: Chapter in Book/Report/Conference proceeding › Chapter › Academic › peer-review
TY - CHAP
T1 - Towards a Digital Infrastructure for Illustrated Handwritten Archives
AU - Weber, Andreas
AU - Ameryan, Mahya
AU - Wolstencroft, Katherine
AU - Stork, Lise
AU - Heerlien, Maarten
AU - Schomaker, Lambert
PY - 2018/3
Y1 - 2018/3
N2 - Large and important parts of cultural heritage are stored in archives that are difficult to access, even after digitization. Documents and notes are written in hard-to-read historical handwriting and are often interspersed with illustrations. Such collections are weakly structured and largely inaccessible to a wider public and scholars. Traditionally, humanities researchers treat text and images separately. This separation extends to traditional handwriting recognition systems. Many of them use a segmentation free OCR approach which only allows the resolution of homogenous manuscripts in terms of layout, style and linguistic content. This is in contrast to our infrastructure which aims to resolve heterogeneous handwritten manuscript pages in which different scripts and images are narrowly intertwined. Authors in our use case, a 17,000 page account of exploration of the Indonesian Archipelago between 1820–1850 (“Natuurkundige Commissie voor Nederlands-Indië”) tried to follow a semantic way to record their knowledge and observations, however, this discipline does not exist in the handwriting script. The use of different languages, such as German, Latin, Dutch, Malay, Greek, and French makes interpretation more challenging. Our infrastructure takes the state-of-the-art word retrieval system MONK as starting point. Owing to its visual approach, MONK can handle the diversity of material we encounter in our use case and many other historical collections: text, drawings and images. By combining text and image recognition, we significantly transcend beyond the state-of-the art, and provide meaningful additions to integrated manuscript recognition. This paper describes the infrastructure and presents early results.Keywords: Deep learning · Digital heritage · Natural historyBiodiversity heritage
AB - Large and important parts of cultural heritage are stored in archives that are difficult to access, even after digitization. Documents and notes are written in hard-to-read historical handwriting and are often interspersed with illustrations. Such collections are weakly structured and largely inaccessible to a wider public and scholars. Traditionally, humanities researchers treat text and images separately. This separation extends to traditional handwriting recognition systems. Many of them use a segmentation free OCR approach which only allows the resolution of homogenous manuscripts in terms of layout, style and linguistic content. This is in contrast to our infrastructure which aims to resolve heterogeneous handwritten manuscript pages in which different scripts and images are narrowly intertwined. Authors in our use case, a 17,000 page account of exploration of the Indonesian Archipelago between 1820–1850 (“Natuurkundige Commissie voor Nederlands-Indië”) tried to follow a semantic way to record their knowledge and observations, however, this discipline does not exist in the handwriting script. The use of different languages, such as German, Latin, Dutch, Malay, Greek, and French makes interpretation more challenging. Our infrastructure takes the state-of-the-art word retrieval system MONK as starting point. Owing to its visual approach, MONK can handle the diversity of material we encounter in our use case and many other historical collections: text, drawings and images. By combining text and image recognition, we significantly transcend beyond the state-of-the art, and provide meaningful additions to integrated manuscript recognition. This paper describes the infrastructure and presents early results.Keywords: Deep learning · Digital heritage · Natural historyBiodiversity heritage
KW - deep learning
KW - Digital Heritage
KW - Natural History
KW - Biodiversity heritage
KW - Digital Humanities
UR - https://www.springer.com/gb/book/9783319758251#aboutBook
U2 - 10.1007/978-3-319-75826-8_13
DO - 10.1007/978-3-319-75826-8_13
M3 - Chapter
SN - 978-3-319-75825-1
T3 - Lecture Notes in Computer Science LNCS
SP - 155
EP - 166
BT - Digital Cultural Heritage
A2 - Ioannides, Marinos
PB - Springer
ER -