Prof. Wolfgang Lehner TU Dresden - SMIX - A Self-Managing Index Infrastructure for Dynamic Workloads, Februar 17, 2012, 11 am s.t., Rudower Chaussee 25, Haus 3, Raum 3.113 (First Floor), 12489 Berlin (Abstract)
- Prof. Bernhard Seeger, Uni Marburg - XXL2020 - Towards a A flexible and extensible infrastructure for generating tailor-made processing systems, Januar 23, 2012, 4pm s.t., Humboldt-Kabinett im Institut für Informatik, Rudower Chaussee 25, 12489 Berlin (Abstract)
- Prof. Tamer Ozsu, University of Waterloo - Some Results in Graph Data Management, December 12, 2012, 4pm s.t., Humboldt-Kabinett im Institut für Informatik, Rudower Chaussee 25, 12489 Berlin (Abstract)
- Prof. Alfons Kemper, TU München - HyPer-sonic Combined Transaction AND Query Processing, 3pm s.t., Humboldt-Kabinett im Institut für Informatik, Rudower Chaussee 25, 12489 Berlin (Abstract)
Title: SMIX - A Self-Managing Index Infrastructure for Dynamic Workloads
Abstract: As databases accumulate growing amounts of data at an increasing rate, adaptive indexing becomes more and more relevant. At the same time, modern analytical and transactional applications show an agile and flexible access behavior, resulting in less steady and less predictable workload characteristics. Being inert and coarse-grained, state-of-the-art index tuning techniques become less useful in such environments. Especially the full-column indexing paradigm potentially results in indexed but never queried data causing prohibitively high memory and maintenance costs. Self-Managing Indexes (SMIX) for point queries are a novel, adaptive, fine-grained, and autonomous indexing infrastructure. In its core, the SMIX approach builds on a novel access path that automatically collects useful index information, discards useless index information, and competes with its kind for resources to host its index information. Compared to existing technologies for adaptive indexing, we are able to dynamically grow and shrink our indexes, instead of incrementally enhancing the index granularity. The talk will outline the basic concepts of the SMIX approach and show results of experiments based on the prototypical implementation inside of the PostgreSQL system.
Bio: Prof. Dr.-Ing. Wolfgang Lehner (1969) is full professor and head of the database technology group at the Dresden University of Technology (Technische Universität Dresden), Germany. He received his Master's degree in Computer Science in 1995 from University of Erlangen-Nuremberg. He continued his studies as research assistant at the database system group until 1998, when he earned his Ph.D. degree (Dr.-Ing.) with a dissertation on the optimization of aggregate processing in multidimensional database systems. Thereafter (11/1998) he joined the Business Intelligence (BI) group at the IBM Almaden Research Center in San Jose (CA), where he was involved in projects on adding materialized view support and multiple query optimization techniques to the core engine of the IBM DB2/UDB database system. After his return to Germany, he again joined the database system group at University of Erlangen-Nuremberg as senior researcher. Together with his group, he initiated comprehensive research in exploiting database technology to support complex message-based notification systems. From 10/2000 to 2/2002, Wolfgang Lehner was on a temporary assignment at the University of Halle-Wittenberg holding the professorship for database systems. In 7/2001 finally, he finished his habilitation with a thesis on subscription systems and was therefore awarded with the venia legendi. Since 10/2002, Prof. Lehner is conducting his research, teaching his students, and is involved in multiple industrial projects at the TU Dresden. Since then, Wolfgang Lehner was temporarily visiting scientist at Microsoft Research in Redmond (WA), at GfK Nuremberg, at SAP Walldorf, and at UBS Zurich.
Titel: "XXL2020 - Towards a flexible and extensible infrastructure for generating tailor-made data processing systems
Abstract: In order to meet new user requirements and to exploit new hardware developments, many tailor-made data processing systems have been developed over the last 15 years. One fundamental research question is, therefore, how to design a flexible and extensible data processing architecture supporting the fast generation of tailor-made data processing systems. The XXL2020 infrastructure developed at the University of Marburg is one approach that has been shown to be successful, not only in academia, but also in industry. In this talk, the basic concepts of XXL2020 are introduced and examples of systems are outlined that have been derived from XXL. One such example is PIPES, a platform for complex event processing that has been commercialized and since recently is owned by Software AG. A second example is a raster data management system currently used in real scientific applications for supporting analytical queries on massive raster data sets. More recent work is dedicated to facilitating the usage of XXL, whose primary interfaces are still too complex for non-experts. XXLinq is a pure Java approach that allows fast and easy coding of complex data-intensive applications based on XXL2020. The ongoing (research) challenges of XXLinq are to optimize the declarative user code to run fast on modern virtualized hardware and to take away the burden of physical database design from the end-user.
Bernhard Seeger is a full professor for data processing infrastructures at the University of Marburg since 1995. Prior to his current position he obtained his habilitation from the LMU München and his doctoral degree from the University of Bremen. He has worked on multidimensional and temporal indexing, physical database design, analytical and I/O-efficient algorithms. For the last decade his research focus has been on complex event processing and new architectures for flexible query processing frameworks. He has been a chief architect of XXL, an open-source library for query processing, and a co-founder of RTM Realtime Monitoring GmbH, which is now owned by Software AG. He has published over 100 papers with more than 4500 citations.
Title: Some Results in Graph Data Management
Abstract: Graphs have always been important data types for database researchers. With the recent growth of social networks, Wikipedia, Linked Data, RDF, and other networks, the interest in managing very large graphs have again gained momentum. I will summarize some of our recent work in this area, focusing on two results: anonymization of published (social) network data, and answering SPARQL queries over RDF graphs. The first problem deals with the release of (social) network data for analysis without releasing confidential identity information. In this research, we propose k-automorphismto protect against multiple structural attacks and develop an algorithm that ensures k-automorphism. The second problem focuses on evaluating SPARQL queries with wildcards over an RDF graph that sees frequent updates. We propose an approach that maps both the RDF data and the SPARQL query into graphs and converts the query evaluation problem to one of subgraph matching. In order to speed up query processing, we propose an indexing mechanism and pruning rules to reduce the search space.
Bio: Tamer Ozsu is Professor of Computer Science at the David R. Cheriton School of Computer Science of the University of Waterloo. He was the Director of the Cheriton School of Computer Science from January 2007 to June 2010. His PhD is from the Ohio State University.
His research is in data management focusing on large-scale data distribution and management of non-traditional data. His publications include the book Principles of Distributed Database Systems (co-authored with Patrick Valduriez), which is now in its third edition. He has also edited, with Ling Liu, the Encyclopedia of Database Systems. He serves as the Series Editor of Synthesis Lectures on Data Management (Morgan & Claypool) and on the editorial boards of three journals, and two book Series. He has served as the Program Chair and General Chair of a number of international conferences.
He is a Fellow of the Association for Computing Machinery (ACM), and of the Institute of Electrical and Electronics Engineers (IEEE), and a member of Sigma Xi. He has held a University Research Chair (2004-2011) and a Faculty Research Fellowship (2000 - 2003) at the University of Waterloo, and a McCalla Research Professorship (1993-1994) at the University of Alberta. He was awarded the ACM SIGMOD Contributions Award in 2006, and The Ohio State University College of Engineering Distinguished Alumnus Award in 2008.
Title: HyPer-sonic Combined Transaction AND Query Processing
Abstract: The HyPer prototype demonstrates that it is indeed possible to build a main-memory database system that achieves world-record transaction processing throughput and best-of-breed OLAP query response times in one system in parallel on the same database state. The two workloads of online transaction processing (OLTP) and online analytical processing (OLAP) present different challenges for database architectures.
Currently, users with high rates of mission-critical transactions have split their data into two separate systems, one database for OLTP and one so-called data warehouse for OLAP. While allowing for decent transaction rates, this separation has many disadvantages including data freshness issues due to the delay caused by only periodically initiating the Extract Transform Load-data staging and excessive resource consumption due to maintaining two separate information systems. We present an efficient hybrid system, called HyPer, that can handle both OLTP and OLAP simultaneously by using hardware-assisted replication mechanisms to maintain consistent snapshots of the transactional data. HyPer is a main-memory database system that guarantees the full ACID properties for OLTP transactions and executes OLAP query sessions (multiple queries) on arbitrarily current and consistent snapshots. The utilization of the processor-inherent support for virtual memory management (address translation, caching, copy-on-write) yields both at the same time:
unprecedentedly high transaction rates as high as 100000 per second and very fast OLAP query response times on a single system executing both workloads in parallel. The performance analysis is based on a combined TPC-C and TPC-H benchmark.
This is joint work with Thomas Neumann, Florian Funke and Henrik Muehe
Bio: Prof. Alfons Kemper studied Computer Science at the University of Dortmund from 1977 – 1980 and, thereafter at the University of Southern California, Los Angeles, USA, from 1980 to 1984. He completed his M.Sc. in 1981 and his Ph.D. in 1984, both at USC. From 1984 until 1991 he was an Assistant Professor at the University of Karlsruhe, Germany. In 1991 he became Associate Professor at the RWTH Technical University Aachen, Germany. From 1993 until 2004 he was a Full Professor for Database Systems at the University of Passau, Germany. Since 2004 he holds the Chair for Computer Science with a focus on Database Systems at the Technische Universität München (TUM), Germany. From 2006 to 2010 he was the Dean of the CS Department of TUM, since then the Vice Dean.