BY Svilen R Mihaylov, University of Pennsylvania
We are witnessing a dramatic increase in the amount of data: environmental readings, web pages, social networks, medical records, genome sequences, etc. Special acquisition strategies and complex analysis are needed in order to detect underlying structure in data, make predictions, learn generalizations, or test hypotheses. From a database perspective, declarative computations are the main mechanism to enable such operations in a scalable, efficient, and easy to use way.
The main challenge today is to build widely applicable distributed database systems which, while leveraging existing database query and network-level optimization techniques, integrate novel approaches to better adapt to and exploit the properties of their particular computing environment. The work in this thesis seeks to expand the limits of what can be computed in environments spanning low-powered sensor devices to cloud server machines. It broadly focuses on making database systems better suited for new computation environments and emerging problems through judicious integration of decades of research with novel techniques. This thesis proposes using a unified programming model for both sensor and cloud computing parallel database systems, allowing for easy data integration and cross-optimizations as exemplified by the ASPEN system.
In the sensor network domain in particular, a novel routing substrate for sensor networks is developed, aimed at efficient minimal-impact exploration of the underlying network graph. A framework for join cost modeling is developed, taking into account the physical topology and relative size and item frequency of streams being aggregated or joined. In the cloud domain, a novel delta-based approach for recursive computations proves to be particularly efficient at tackling a variety of distributed machine learning algorithms. We incorporate this new model into a parallel database system with an efficient incremental failure handling capability and a sophisticated query optimizer to demonstrate substantial performance gains compared to traditional approaches.
LINK to .PDF: http://search.proquest.com/docview/1289069324
Mihaylov, Svilen R, “A scalable approach to complex computations” (2012). Dissertations available from ProQuest. AAI3551718.