Polyglot approach to storing data. Interview with John Allison
“We were looking for solutions which provided the data integrity guarantees we needed, provided clustering tools to ease operational complexity, and were able to handle our data size and the read/write throughput we required.”–John Allison
I have interviewed John Allison, CTO and founder of Customer.io, a start up company in Portland, Oregon.
Q1. What is the business of Customer.io ?
John Allison: We help our customers send timely, targeted messages based on user activity on their website or mobile app. We achieve this by collecting analytical data, providing real-time segmentation, and allowing our customers to define rules to trigger messages at different points in their interactions with a user.
Q2. How large are the data sets you analyze?
John Allison: We’ve collected 6 terabytes of analytical event data for over 55 million unique users across our platform. Due to it’s nature, this data continues to grow and grows faster as we collect data for more and more users.
Q3. What are the main business and technical challenges you are currently facing?
John Allison: As we continue to grow our business, we need to ensure the technical side of our service can easily scale out to support new customers who want to use our product.
Q4. Why did you replace your existing underlying database architecture supporting your “MVP” product ? What were the main technical problems you encountered?
John Allison: As our data set grew in size to the point where we couldn’t realistically manage it all on a small number of servers, we began looking for alternatives which would allow us to continue providing our service in a larger, more distributed way.
Q5. How did you evaluate the alternatives?
John Allison: We evaluated many options and found that most didn’t live up to the availability or consistency guarantees they promised when run over a cluster of servers. We were looking for solutions which provided the data integrity guarantees we needed, provided clustering tools to ease operational complexity, and were able to handle our data size and the read/write throughput we required.
Q6. How is the new solution looking like?
John Allison: We’ve taken more of a polyglot approach to storing our data. We are consolidating on three main clustered databases:
1) FoundationDB – Data where distributed transactions and consistency guarantees are most important.
2) Riak – Large amounts of immutable data where availability is more important.
3) ElasticSearch – Indexing data for ad-hoc querying.
All three have built in tools for expanding and administrating a cluster, provide fault-tolerance and increased reliability in the face of server faults, and each provides us with unique ways to access our data.
Q7. What experience do you have with this new database architecture until now? Do you have any measurable results you can share with us?
John Allison: Embracing a distributed architecture and storing data in the right database for a given use-case has led to less time worrying about operations, increased reliability of our service as a whole, and the ability to scale out all parts of our infrastructure to increase our platform’s capacity.
Q8. Moving forward, what are your plans for the next implementation of your product?
John Allison: Continuing to improve our product in order to provide the most value we can for our customers.
John Allison is the CTO and founder of Customer.io, a startup focused on making it easy to build, manage, and measure automatic customer retention emails. Prior to that he was the head of engineering at Challengepost.com. He is a world traveler, Golfer, and an Arkansas Razorback fan.
We have published several new experts articles on Big Data and Analytics in ODBMS.org.
– On Mobile Data Management. Interview with Bob Wiederhold. ODBMS Industry Watch, 2014-11-18.
–Big Data Management at American Express. Interview with Sastry Durvasula and Kevin Murray. ODBMS Industry Watch, 2014-10-12
Follow ODBMS.org on Twitter: @odbmsorg