[dbis logo]

[Institut fuer Informatik] [Leerraum] [Humboldt-Universitaet zu Berlin]

Advanced Database Technologies in Bioinformatics

Ever since the publication of the first draft of the human genome sequence in 2001 [IHGSC01], [Venter et al. 01] the word genomics resounded throughout the land. The technical advances in sequence determination and biological analysis in recent years have led to an explosive groth in the amount of data produced from biological experiments. In 1999 Anthony Krevalage of Celera cited that an experimental laboratory can produce over 100 gigabytes of data a day with ease. The next step is to analyse the experimental data to gain biological knowledge and discover the basic rules underlying the relationships between the objects, their individual properties, and thus finally explaining the statistical distribution of the recorded facts.


To be able to manage and analyse the massive amount of data methods from computer science have been used by biologist since years. A multitude of tools and methods have been developed in this context to support the task of data storage and analysis. Today genomics without the usage of computers is quite impossible to imagine. Out of this appliance the field of bioinformatics, sometimes called computational biology, resulted. It is an interdisciplinary field that includes both theoretical and practical contributions from computer science, mathematics and biology. Still there is no common agreed definition of these terms. In our opinion BIOINFORMATICS describes the appliance of methods and techniques from computer science to solve biological problems. COMPUTATIONAL BIOLOGY in contrast is limited to the algorithmic solution of biological problems. Among others bioinformatics uses methods from artifical intelligence, database and information systems, coputer liguistics, pattern discovery, dynamic programing, statistical analysis and computer simulation.


Our research intrest focuses on the possibilities of intelligent usage of current research outcomes in the area of databases and information systems in bioinformatics. While many information is held in flat file format, some institutions already use relational or even object-oriented database management systems. Still, there is a lot of room for improvements. Currently, special algorithms like BLAST (Basic Local Alignment Search Tool) or FASTA to perform sequence analysis tasks run outside the database server, causing performance to degrade as interactions with the database increase. Implementing them as database functions or stored procedures can lead to a reduction of execution time. Using parallel features of database management systems and new storage and index structures can additionally speed up data retrieval. Also there are techniques from data warehousing and data mining that can support data integration, help to improve data quality, and the gain of knowledge from the raw data. All these points are part of our ongoing research in the area of advanced database technologies in bioinformatics.


  1. Alternative Splice Form Detection through Exon Skipping
  2. CABS - Comprehensive Analysis of Biological Sequences
  3. db.bio - The Genome Data Warehouse
  4. Gene Viator - Graph Visualisation

[Punkt]  DFG-Forschergruppe Stratosphere

[Punkt]  DFG-Graduate School SOAMED

[Punkt]  DFG-Graduate School METRIK

[Punkt]  Link Traversal Based Query Execution

[Punkt]  Web of Trusted Data

[Punkt]  Query Optimization in RDF Databases

[Punkt]  DBnovo - Datenbankgestützte Online Sequenzierung

Contact persons

+49 30 2093-3020

+49 30 2093-3025

+49 30 2093-3022

+49 30 2093-3018