[dbis logo]

[Institut fuer Informatik] [Leerraum] [Humboldt-Universitaet zu Berlin]

Link Traversal Based Query Execution - A Novel Approach to Query the Web of Linked Data

The World Wide Web currently evolves into a Web of Data where content providers publish and connect data in a manner similar to the approach used for Web documents over the last 20 years. While the execution of structured, SQL-like queries over this emerging dataspace opens possibilities not conceivable before, querying the Web of Data poses novel challenges. Due to the openness of the Web, it is impossible to know all data sources that might contribute to the answer of a query. To tap the full potential of the Web, traditional query execution paradigms are insufficient because those assume a fixed set of potentially relevant data sources beforehand.


We work on a novel query execution paradigm that allows the execution engine to discover potentially relevant data during the evaluation of a query. Our approach of answering queries makes use of the characteristics of the Web of Data, in particular, the existence of data links between data items of different sources. The general idea of our approach, which we call link traversal based query execution, is to intertwine the construction of query results with the traversal of those data links that correspond to intermediate solutions in the construction process.


The integration of link traversal in the query execution process and the ability to discover data from unknown sources present novel challenges for the development of execution engines and for the application of query planning and optimization. In our work we address the following questions: How do we implement the general idea of link traversal based query execution in a query system? What are the trade-offs of different implementation approaches? How do we generate query execution plans without any information about statistics or distribution of the data that will be discovered? How do we reduce the impact of network access times on query execution times? How do we benefit from reusing data discovered and retrieved during the execution of previous queries as seed data for the current query? How do we integrate an assessment of the quality and trustworthiness of discovered data in order to guarantee certain quality criteria for the query results?

ERROR: Content Element type "page_php_content_pi1" has no rendering definition!


ERROR: Content Element type "page_php_content_pi1" has no rendering definition!

[Punkt]  DFG-Forschergruppe Stratosphere

[Punkt]  DFG-Graduate School SOAMED

[Punkt]  DFG-Graduate School METRIK

[aktiver Punkt]  Link Traversal Based Query Execution

[Punkt]  Web of Trusted Data

[Punkt]  Query Optimization in RDF Databases

[Punkt]  DBnovo - Datenbankgest├╝tzte Online Sequenzierung

Contact persons