On Data Infrastructure Complexity. Q&A with Rick Negrin
“Modern applications are using data more intensely than ever before because their users are demanding it. But the legacy data infrastructure most companies are using cannot keep up because it wasn’t designed for this kind of scale, availability and performance.”
Q1. Please tell the readers what you do at SingleStore.
I joined SingleStore in 2016 and ran the product management team (as well as training and documentation). I defined the product strategy and oversaw the development of the product that would allow us to execute on that strategy. I recently transitioned to being the field CTO. The field CTO role is a new one for me, and I have been working on defining it. One of our main challenges as a company is that we have built something unique that solves a new kind of problem in the industry. It solves a problem that customers have but don’t have a language to describe. That makes it hard for the field team at SingleStore. So, my primary job is to bring that articulation of what we do to the market and to our customers. I did this to some degree as the product leader, but I was restricted in how much time I could spend on that since there was a lot of execution work to do. As field CTO I can make this task my top priority. I do this both directly by presenting to customers but also by defining the talk track and testing it in the field. I do other things as well (path finding with partners, educating the sales engineering team and identifying new solution patterns, to name a few), and I am building a team to execute on all these goals.
Q2. What are you currently working on?
Talking to Customers – I spend a large chunk of my time talking to prospects and customers. I work to understand the problems they have and how SingleStore can help them. I use those conversations to refine the articulation of our strategy and our vision and pass that feedback to the teams (marketing, product, engineering and sales).
Field Tooling and Education – I also work on tooling and education for the field. One example of this is sizing. Because we are a data infrastructure product, our pricing model is based on the amount of resources used. That means sizing is a key part of the discussions with customers, as that directly affects how much they have to spend. But because our solution is so unique and the workloads we solve for are so new, there is no standard way to calculate the size of resources that are needed. This causes a lot of angst with our customers! I am working with several folks in the company to build a sizing calculator to make this capacity planning for our customers (and our field) much easier.
Path-finding with Partners – We recently signed several partner agreements (SAS, IBM, HP and Dell). I have been working with them to define our “better together” story. These large infrastructure and analytics vendors recognize that we bring a unique and powerful solution that complements their technology. I am working with the partner teams on both sides to refine how our technologies and our teams work together to solve our customers’ problems. I am particularly excited about our partnership with IBM. You can hear more about it in our upcoming webinar that I am doing with Edward Calvesbert, IBM director of PM for data & AI.
Q3. Data infrastructure complexity is rampant in our industry. What are the most common data infrastructure problems companies encounter?
The biggest problem we see is solutions that cannot hit their SLAs. Users expect to have the insights they need to make key decisions at their fingertips and at a moment’s notice. They expect everything from being notified of fraud on their credit card, alerted to a need to reschedule a trip because of weather conditions or responding to a surge in traffic on their company website. They expect these systems to be available 24/7 and for everything to be updated in real time. Modern applications are using data more intensely than ever before because their users are demanding it. But the legacy data infrastructure most companies are using cannot keep up because it wasn’t designed for this kind of scale, availability and performance.
Q4. There are a plethora of special-purpose options available for that. What are your considerations on how to effectively manage data infrastructure sprawl?
The best way to manage data infrastructure sprawl is not to have it at all! Companies don’t want sprawl. They don’t make their plans for the year saying ‘This year we only have 30 different databases in our infrastructure. By the end of this year I want to make it 50!’ Managing so many different systems is taxing. It requires complexity in moving data between the systems. It is hard to build, manage, troubleshoot and optimize. The reason companies end up with so many data technologies in a single solution is that they can’t find a database that can meet the requirements of the applications. So they are forced to stitch together multiple technologies to meet the needs of the application. Sometimes they can get it working, but it comes at a high cost and often is the root issue blocking their ability to meet the SLAs of the application.
Q5. It seems that companies often aren’t aware they’re on a collision course with database sprawl. How do companies realize when they are getting caught in database sprawl?
Customers don’t intend to end up with sprawl. They generally start out with a general- purpose database because the data intensity of most applications starts out small and is easily handled by most of the operational databases out there. Then new requirements get added to support more search capabilities, additional data sets or other things. These new capabilities start to strain the chosen database. Also, as the system starts to scale in multiple dimensions (total data size, more data ingested, query complexity and query concurrency) things start to get slower. Queries take longer to run, data ingests slower. So they add specialty databases to handle the additional functionality and caches to relieve the pressure on the database causing the bottlenecks. Before you know it, you have database sprawl. Sometimes it is only a few databases. But, in some cases, solutions have more than a dozen. It is never pretty.
Q6. How do they get out, to lower costs and accelerate use cases?
The answer is simple. Look across all the key dimensions of how you are using data (data size, ingestion rate, query complexity, query concurrency and query latency). Then look at the growth rate you expect in all these dimensions. From there you can determine which data infrastructure will meet your needs. If you have an application with a high level of data intensity (i.e. high requirements in two or more of the dimensions listed above) then you should start out with the right data infrastructure. Keep in mind the kind of growth you expect because even if you pick a database that can handle your workload today, it may not scale as your usage grows.
Q7. Do you have some industry best practices and lessons learned on curbing the sprawl you can share with us?
Comprehensive testing and capacity planning. Customers often only factor in one or two dimensions (data size and query latency). They test the key queries on a small data set in isolation and think that everything will be fine. But when the system runs in production other things are going on. There are multiple users doing different things at the same time (concurrency). Data is often being ingested all the time. Maintenance operations need to be run. What worked when you had a single query running in isolation doesn’t work as well when the system is loaded up with all the things that happen in the real world. Then when things start failing in production, and you are under the gun, your choices are limited in how to fix them. This is how sprawl happens.
Q8. How is it SingleStore positioned in the market for Cloud Database Management Systems?
We built SingleStore to be the best database for data-intensive applications. It is multi-model, so it can support relational and non-relational databases. It is cloud native, so it scales to meet your needs and has high availability and resiliency built in. It was architected with data-intensive applications in mind, so it supports high levels of concurrent queries and guarantees low latency of those queries. All the operations are online and the data operations are lock-free, so you can maintain your SLAs even while ingesting data or doing maintenance operations.
Rick Negrin VP, Product Management, SingleStore.
Rick Negrin is field CTO and a vice president of product management. In 2022, Negrin took on the field CTO role. Previously, Negrin oversaw product strategy and management at SingleStore for 6 years. Before joining SingleStore, he spent 12 years at Microsoft leading teams for SQL Server and Azure SQL Database. Negrin holds a B.S. in computer engineering from the University of Washington.
Sponsored by SingleStore.