Data integration is a key issue in the domain of bioin- formatics, which deals with huge amounts of heteroge- neous biological data that grows and changes rapidly. This paper serves as an introduction in the field of bioinformatics and the biological concepts it deals with, and an exploration of the integration problems a bioinformatics scientist faces. We examine ProGMap, an integrated protein homology system used by bioin- formatics scientists at Wageningen University, and several use cases related to protein homology. A key issue we identify is the huge manual effort required to unify source databases into a single resource. Un- certain databases are able to contain several possi- ble worlds, and it has been proposed that they can be used to significantly reduce initial integration efforts. We propose several directions for future work where uncertain databases can be applied to bioinformatics, with the goal of furthering the cause of bioinformatics integration.
|Place of Publication||Enschede|
|Publisher||Centre for Telematics and Information Technology (CTIT)|
|Number of pages||17|
|Publication status||Published - Aug 2011|
|Name||CTIT Technical Report Series|
|Publisher||University of Twente, Centre for Telematics and Information Technology|
Boerman, T. W., van Keulen, M., van der Vet, P. E., & Severing, E. I. (2011). Integration of Biological Sources: Exploring the Case of Protein Homology. (CTIT Technical Report Series; No. TR-CTIT-11-18). Enschede: Centre for Telematics and Information Technology (CTIT).