BENCHMARKING INTERACTIVE SOCIAL NETWORKING ACTIONS
BENCHMARKING INTERACTIVE SOCIAL NETWORKING ACTIONS
SUMITA BARAHMAND, DEPARTMENT OF COMPUTER SCIENCE, USC
ADVISER: PROFESSOR SHAHRAM GHANDEHARIZADEH
Dissertation- SPRING 2014
Copyright by Sumita Barahmand, 2014. All rights reserved.
Abstract
Social networking sites such as Google+, Facebook, Twitter and LinkedIn, are cloud ser- vice providers for person to person communications. There are different approaches to building these sites ranging from SQL to NoSQL and NewSQL, Cache Augmented SQL, graph databases and others. Some provide a tabular representation of data while others offer alternative models that scale out. Some may sacrifice strict ACID (Atomicity, Consistency, Isolation, Durability) properties and opt for BASE (Basically Available, Softstate, Eventual consistency) to enhance performance. Independent of a qualitative discussion of these approaches and their merits, a key question is how do these systems compare with one another quantitatively?
This dissertation investigates the viability of a benchmark to address this question.
Our primary contribution is the design and implementation of a novel benchmark for interactive social networking actions named BG (http://bgbenchmark.org). BG’s design decisions are as follows: First, it rates the performance of a system for processing inter- active social networking actions by computing two values: Socialites and Social Action Rating (SoAR) using a pre-specified Service Level Agreement, SLA. An example SLA may require 95% of issued requests to observe a response time faster than 100 millisec- onds. Second, BG elevates the amount of unpredictable data produced by a solution to a first class metric, including it as a key component of the SLA (similar to the average re- sponse time) and quantifying it as a part of the benchmarking process. It also computes the freshness confidence to characterize the behavior of a weak consistency technique. Third, BG’s generated workload is characterized by reads and writes of a very small amount of data from big data. Fourth, BG is a modular, extensible framework that is agnostic to its underlying data store.
Fifth, BG employs a logical partitioning of data to scale both ver- tically and horizontally to thousands of nodes.
This is essential for evaluating scalable installations consisting of thousands of nodes. Finally, BG includes a visualization tool to empower an evaluator to monitor an in-progress benchmark and identify bottlenecks.
BG’s possible use cases are diverse. One may use BG to compare and contrast vari- ous data stores with one another, characterize tradeoffs associated with alternative physical representations of data, or quantify the behavior of a data store in the presence of various failures (either CP or AP of the CAP theorem) among the others. This dissertation demon- strates use of BG in two contexts. First, to rate an industrial strength relational database management system and a document store, quantifying their performance tradeoffs. This analysis includes the use of a middle tier cache (memcached) and its impact on the per- formance of each system. Second, to gain insight into alternative design decisions for implementing a social action by characterizing their behavior with different social graphs and system loads.
BG’s proposed framework is quite novel and opens several new research directions that benefit the systems research community.