About Stratosphere
The Humboldt Universität Berlin, Technische Universität Berlin, and the Hasso-Plattner-Institut in Potsdam are jointly researching "Information Management on the Cloud" through the "Stratosphere" Collaborative Research Unit funded by the Deutsche Forschungsgemeinschaft (DFG). Stratosphere aims at considerably advancing the state-of-art in data processing on parallel, adaptive architectures. Stratosphere (named after the layer of the atmosphere above the clouds) explores the power of massively parallel computing for complex information management applications. Building on the expertise of the participating researchers, we aim to develop a novel, database-inspired approach to analyze, aggregate, and query very large collections of either textual or (semi-)structured data on a virtualized, massively parallel cluster architecture.
Vision
Stratosphere will conduct research in the areas of massively parallel data processing engines, a programming model for parallel data programming, robust optimization of declarative data flow programs, continuous re-optimization and adaptation of the execution, data cleansing, and text mining. The unit will validate its work through a benchmark of the overall system performance and by demonstrators in the areas of climate research, the biosciences and linked open data.
Goals
The goal of Stratosphere is to jointly research and build a large-scale data processor based on concepts of robust and adaptive execution. We will be researching a programming model that extends a functional map/reduce programming model with additional second order functions. As execution platform we use a Dryad-like massively parallel data flow engine that will also researched and developed in the project. We will be examining real-world use-cases in the area of climate research, information extraction and integration of unstructured data in the life-sciences, as well as linked open data and social network graph data.
The group will provide the opportunity to perform high-quality and cutting-edge research in an international context and in strong cooperation. Suitable Postdoc candidates will have the chance to perform research on the topics of the group, which qualifies them for an academic career and allows them to establish themselves as independent researchers in the research community.
Partners
The project will be carried out jointly by Johann-Christoph Freytag, (HU Berlin, Database and Information Systems Group), Prof. Ulf Leser (HU Berlin, Knowledge Management in Bioinformatics), Prof. Volker Markl (TU Berlin, Database Systems and Information Management Group, Speaker), Prof. Odej Kao (TU Berlin, Distributed Systems Group), Prof. and Prof. Felix Naumann (HPI Potsdam, Database and Information Systems Group).