On Cloud Database Management Systems. Interview with Jordan Tigani
“If a company starts as an on-premise business and decides to become relevant in the cloud space it requires dedicated energy devoted to changing the culture.
...If there was a lesson to share it would be to not under-invest or think it will be easy. You also need to hire people with cloud experience. There are a number of areas where you can hire smart people and they will pick up what they need, but if you’re trying to make a transition there is no substitute for actual hands-on experience with what happens when you try to scale.” — Jordan Tigani.
Q1. You are Chief Product Officer at SingleStore since June 30th, 2020. What are the main projects you have been working at SingleStore?
Jordan Tigani: The number one thing that I’ve been focused on is helping SingleStore transform into a cloud company. This means more than having a product that runs in the cloud; you need to reimagine how you build software, how you monitor it, how you support it, and what features you need. We’ve got a great team that has taken these ideas and ran with them, but to some extent, this is a cultural change, and that takes a lot of time and directed energy.
I’ve also been working on refining the mission and completing the technology so that it solves all use cases. For the last year, we’ve been focused on data-intensive applications, which are, broadly, applications that hit bottlenecks in data. This is a growing subset of the database market, as richer applications tend to want to do more interesting things with their data.
Q2. You co-created Google BigQuery as a founding engineer and went on to be the Director for Engineering Director and also Product Management. How much has your work at Google influenced your current work at SingleStore?
Jordan Tigani: My two biggest learnings from Google were how to build a cloud product that scales (when I left BigQuery it was using about 3 million CPU cores) and a deep customer empathy for the cloud analytics market.
I also saw a lot of things that customers wanted to do, but we had a hard time making the technology work to solve their problems. One of our tech leads had a great saying: “It’s just code.“ This meant that given enough time, you could make any feature work. However, if you didn’t have the right architecture you would hit limitations, and all the clever coding in the world would be able to help you.
Some of the things that BigQuery customers were pushing us to do—like being able to do rapid updates, or serve low latency queries—were things that were incredibly difficult to do with the architecture. Many of these same things were problems that SingleStore had already solved, and by virtue of their architecture, there was a technological moat that would be hard for competitors to cross.
Q3. The tag line of SingleStore is “The Single Database for All Data-Intensive Applications for operational analytics and cloud-native applications“. To demonstrate how fast SingleStore is on both transactional and analytical workloads you did comparative benchmarks against leading cloud vendors for both TPC-H (analytics) and TPC-C (transactions). What were the main results?
Jordan Tigani: The main takeaway was that SingleStore is as fast or faster on analytics benchmarks as cloud data warehouses, and is as fast or faster than cloud operational databases at transactional benchmarks. This means that in one database, with one storage type, you can get stellar performance on both transactions and analytics, which many people think is impossible. This brings us closer to having a “general purpose” database, where you don’t have to necessarily plan what you’re going to do with it before you start using it.
Q4. Why did you compare separately your performance at TPC-H with data warehouse vendors, and TPC-C with only one operational database vendor? What did you learn?
Jordan Tigani: On the analytics side, the main cloud data warehouse vendors have been engaging in public benchmark wars and focusing on performance. We didn’t want to escalate the amount of noise being thrown around, but we did want to call attention to the fact that we can put up some pretty stellar numbers ourselves.
We only measured one operational database vendor because TPC-C is harder to set up, and because of the way it is defined, it doesn’t provide rich information like TPC-H. We’re working with a third-party vendor to release a more detailed report, which will include additional operational database vendors.
Q5. How did your perform the test?
Jordan Tigani: We had ignored benchmarks for a long time since they often do not correlate with real-world performance. But in recent months, data warehouse vendors have been poking each other about TPC results. So, we put a couple of engineers on the problem and had them run some tests against our database and competitors.
When you run competitive benchmarks yourself you often get accused of selective reporting or cheating (Databricks and Snowflake had a recent public spat about this). We’ve hired a third-party vendor to reproduce the results and the report should be out in another month or so. When they do publish their report, they’ll also reveal the companies they are comparing us against.
Q6. You mentioned in the article that your benchmark runs used the schema, data and queries as defined by the TPC. However, they do not meet all the criteria for official TPC benchmark runs, and are not official TPC results. Isnt`t this a limitation to the acceptance of such bechnmarks?
Jordan Tigani: It is very expensive and difficult to do an official TPC submission, and at the end of the day, it doesn’t tell you much. For TPC-H, for example, we did a “power run,” which means running the queries sequentially. This shows off the ability to perform well in several different query shapes that are indicative of a data warehousing workload. It is a lot harder to run a full TPC-H benchmark as it involves multiple concurrent queries and changing data.
Q7. Not every workload needs transactions and analytics. What are the typical applications that need some flavor of both?
Jordan Tigani: There are two types of applications that need both analytics and transactions. The first is applications that are doing analytics. That is, they’re showing custom dashboards and slicing and dicing data. They tend to need up-to-the-moment data and low latency because they’re serving requests to end-users. They also need high concurrency because they are being used by analysts and are part of the end product being served. Data warehouses aren’t a great option in this use case, because they can’t scale to high concurrent user counts and are generally designed for throughput rather than low latency. SingleStore has a lot of customers in the financial services industry who back a lot of their portfolio analytics tools behind SingleStore databases.
The other type of application that needs analytics and transactions is one that wants to make use of data to enrich the experience. Maybe they want to do a product search and faceted drill-down. Maybe they want to show a leaderboard at a game. Traditional databases aren’t always great at these use cases once you get beyond a certain scale, and then performance can fall off a cliff. People end up stitching multiple databases together—maybe adding a cache on top of it because it is slow—and then have to deal with complexity to keep a consistent model and all the data in sync. Have you ever seen an application that showed a notification or unread message count, and then when you clicked on it there weren’t any notifications or unread messages? This is one of the ways this pattern shows up to the detriment of users; if they had used SingleStore they could keep those values in sync.
Q8. You are quoted saying that “Making the jump to being a cloud-native rather than just a company who runs their product on the cloud requires deep changes throughout the organization”. What are the key lessons learned you wish to share?
Jordan Tigani: If a company starts as an on-premise business and decides to become relevant in the cloud space it requires dedicated energy devoted to changing the culture. We drew up a 24-point score card last year and graded where SingleStore was on every axis of cloud readiness. The score card had everything from Elasticity to Auth to Scalability. We created a plan to get everything to “green” – it takes a long time and a lot of sustained energy, but it was worthwhile to do so.
It paid off considering we were one of the 20 databases recognized by Gartner in the 2021 Magic Quadrant for Cloud Database Management Systems. We believe that is something that could have not happened if we didn’t dedicate significant energy to making sure we were thoroughly cloud.
If there was a lesson to share it would be to not under-invest or think it will be easy. You also need to hire people with cloud experience. There are a number of areas where you can hire smart people and they will pick up what they need, but if you’re trying to make a transition there is no substitute for actual hands-on experience with what happens when you try to scale.
Q9. How is the pandemic changing the market for enterprise infrastructures?
Jordan Tigani: The pandemic is changing the market for enterprise infrastructure in two ways. First, it is accelerating the transition to the cloud. If you’ve got a physical server somewhere you have to have staff that physically maintains those machines, which goes in the opposite direction of a workforce that is becoming more distributed and remote in the pandemic.
Secondly, the pandemic is accelerating the need for fast, accurate data. If you’re in the office, you can often tell how things are going by the “buzz.” But if your only connection to your team and your customers is through zoom, there is a lot of key information that is missing. The only way to get some semblance of that information back is through data and being able to mine what customers are doing, how sales are going, and how much attrition you’re seeing in the workforce.
Big data analytics tools were, to some extent, developed to handle cases where you had so many customers that you couldn’t meet them all and could only get a pulse by looking at data. Google and Amazon are two companies that relied heavily on data because they had to. These techniques are being applied successfully when you may not have billions of customers, but have a difficult time reading their pulse.
Q10. Can you tell us a bit how did you help True Digital in Thailand to develop heat maps around geographies with large COVID-19 infection rates? What lessons did you learn?
Jordan Tigani: True Digital is a telecom provider in Thailand that was able to use cellular data to help track the spread of the pandemic. In the early days of Covid-19, there was a huge focus on getting answers quickly and they were able to build out and ship an application on top of SingleStore in a matter of weeks. One lesson we learned was that if you need to build something in a hurry that needs to scale quickly, making sure you have the right tools when you start is important. SingleStore was ideally suited for True Digital’s needs, and we helped them get something out faster than they would have otherwise. You can read more about our work with True Digital here.
Q11. You are quoted saying “I like the idea of using AI to augment and go beyond what you can do currently. There’s really intelligence, which is a step beyond analytics, which is driving real insight from the data and automatic insight from the data.” Can you please elaborate on this?
Jordan Tigani: There is a hierarchy of analytical needs, and at the base level is collecting data. If you don’t have the data, then you’re blind to what is happening in the data.
The next step is understanding the data sources, which requires a feedback loop with a human to understand what the data is telling you. Too often people try to skip this step and jump right making decisions based on the data, and they end up making the wrong decisions because the data isn’t actually telling them what they thought it was telling them. A great example I’ve seen of this is when people were looking at counts of customers, but every customer that wasn’t logged in got the same customer ID, so the averages got completely skewed.
Once you have data that is cleaned and reputable, you can start understanding what the data shows. This is where BI and dashboards come in. Insight tends to come from questions that someone asks, like “why were my sales down in the southern region?”
Where it starts to get interesting is when you take the next step; making data-driven decisions. You have data that you understand and rely on, and you have been able to drill down and ask questions. AI and machine learning can help you all along the way–from figuring out what data to capture, to the structure of your data, to answering questions. As the last step, you need absolute trust in the lower levels of the system, or else you risk making a lot of bad decisions that you can’t diagnose.
Q12. You are Board Member of Atlas Corps, whose mission is to address critical social issues.
Jordan Tigani: There are generally two types of organizations that address social issues: those that address the issues directly, and those that seek to address the roots of the problems. For example, an organization in the former category would help distribute food during a famine while the latter would help teach sustainable farming.
As an engineer and someone who appreciates the building of the right systems and architectures, organizations that help improve systems are most interesting to me. Atlas Corps generally goes one step further than just trying to address the root of problems; they seek to help train people who are themselves addressing the problem. Who are we to come in and tell people how to farm, for example? Why not help boost the people in those locations who already have the context, and help teach them how to build stronger and scalable organizations?
Q13. What are the current projects?
Jordan Tigani: The pandemic has been hard on Atlas Corps since their model involved bringing social sector leaders to the United States for training and service in social change organizations. If you can’t bring people into the country, or those organizations are working remotely, it’s difficult to make those programs work. Atlas Corps has been working on building out their model to handle remote work, at least partly, which has made it work and scale better during the pandemic. Their tagline is “talent is universal, opportunity is not,” which is a lesson I try to apply everywhere.
Jordan Tigani, Chief Product Officer, SingleStore.
Jordan is the Chief Product Officer at SingleStore, where he oversees the engineering, product and design teams. He was one of the creators of Google BigQuery, wrote two books on the subject, and led first then engineering and then the product teams. He is the veteran of several star-crossed startups, and spent several years at Microsoft working on bit-twiddling.
Follow us on Twitter: @odbmsorg