On big data analytics. Interview with Ajay Anand
“Traditional OLAP tools run into problems when trying to deal with massive data sets and high cardinality.”–Ajay Anand
I have interviewed Ajay Anand, VP Product Management and Marketing, Kyvos Insights. Main topic of the interview is big data analytics.
Q1. In your opinion, what are the current main challenges in obtaining relevant insights from corporate data, both structured and unstructured, regardless of size and granularity?
Ajay Anand: We focus on making big data accessible to the business user, so he/she can explore it and decide what’s relevant. One of the big inhibitors to the adoption of Hadoop is that it is a complex environment and daunting for a business user to work with. Our customers are looking for self-service analytics on data, regardless of the size or granularity. A business user should be able to explore the data without having to write code, look at different aspects of the data, and follow a train of thought to answer a business question, with instant, interactive response times.
Q2. What is your opinion about using SQL on Hadoop?
Ajay Anand: SQL is not the most efficient or intuitive way to explore your data on Hadoop. While Hive, Impala and others have made SQL queries more efficient, it can still take tens of minutes to get a response when you are combining multiple data sets and dealing with billions of rows.
Q3. Kyvos Insights emerged a couple of months ago from Stealth mode. What is your mission?
Ajay Anand: Our mission is to make big data analytics simple, interactive, enjoyable, massively scalable and affordable. It should not be just the domain of the data scientist. A business user should be able to tap into the wealth of information and use it to make better business decisions or wait for reports to be generated.
Q4. There are many diverse tools for big data analytics available today. How do you position your new company in the already quite full market for big data analytics?
Ajay Anand: While there are a number of big data analytics solutions available in the market, most customers we have talked to still had significant pain points. For example, a number of them are Tableau and Excel users. But when they try to connect these tools to large data sets on Hadoop, there is a significant performance impact. We eliminate that performance bottleneck, so that users can continue to use their visualization tool of choice, but now with response time in seconds.
Q5. You offer “cubes on Hadoop.” Could you please explain what are such cubes and what are the useful for?
Ajay Anand: OLAP cubes are not a new concept. In most enterprises, OLAP tools are the preferred way to do fast, interactive analytics.
However, traditional OLAP tools run into problems when trying to deal with massive data sets and high cardinality.
That is where Kyvos comes in. With our “cubes on Hadoop” technology, we can build linearly scalable, multi-dimensional OLAP cubes and store them in a distributed manner on multiple servers in the Hadoop cluster. We have built cubes with hundreds of billions of rows, including dimensions with over 300 million cardinality. Think of a cube where you can include every person in the U.S., and drill down to the granularity of an individual. Once the cube is built, now you can query it with instant response time, either from our front end or from traditional tools such as Excel, Tableau and others.
Q6. How do you convert raw data into insights?
Ajay Anand: We can deal with all kinds of data that has been loaded on Hadoop. Users can browse this data, look at different data sets, combine them and process them with a simple drag and drop interface, with no coding required. They can specify the dimensions and measures they are interested in exploring, and we create Hadoop jobs to process the data and build cubes. Now they can interactively explore the data and get the business insights they are looking for.
Q7. A good analytical process can result in poor results if the data is bad. How do you ensure the quality of data?
Ajay Anand: We provide a simple interface to view your data on Hadoop, decide the rules for dropping bad data, set filters to process the data, combine it with lookup tables and do ETL processing to ensure that the data fits within your parameters of quality. All of this is done without having to write code or SQL queries on Hadoop.
Q8. How do you ensure that the insights you obtained with your tool are relevant?
Ajay Anand: The relevance of the insights really depends on your use case. Hadoop is a flexible and cost-effective environment, so you are not bound by the constraints of an expensive data warehouse where any change is strictly controlled. Here you have the flexibility to change your view, bring in different dimensions and measures and build cubes as you see fit to get the insights you need.
Q9. Why do technical and/or business users want to develop multi-dimensional data models from big data, work with those models interactively in Hadoop, and use slice-and-dice methods? Could you give us some concrete examples?
Ajay Anand: An example of a customer that is using us in production to get insights on customer behavior for marketing campaigns is a media and entertainment company addressing the Latino market. Before using big data, they used to rely on surveys and customer diaries to track viewing behavior. Now they can analyze empirical viewing data from more than 20 million customers, combine it with demographic information, transactional information, geographic information and many other dimensions. Once all of this data has been built into the cube, they can look at different aspects of their customer base with instant response times, and their advertisers can use this to focus marketing campaigns in a much more efficient and targeted manner, and measure the ROI.
Q10. Could you share with us some performance numbers for Kyvos Insights?
Ajay Anand: We are constantly testing our product with increasing data volumes (over 50 TB in one use case) and high cardinality. One telecommunications customer is testing with subscriber information that is expected to grow to several trillion rows of data. We are also testing with industry standard benchmarks such as TPC-DS and the Star Schema Benchmark. We find that we are getting response times of under two seconds for queries where Impala and Hive take multiple minutes.
Q11. Anything else you wish to add?
Ajay Anand: As big data adoption enters the mainstream, we are finding that customers are demanding that analytics in this environment be simple, responsive and interactive. It must be usable by a business person who is looking for insights to aid his/her decisions without having to wait for hours for a report to run, or be dependent on an expert who can write map-reduce jobs or Hive queries. We are moving to a truly democratized environment for big data analytics, and that’s where we have focused our efforts with Kyvos.
Ajay Anand is vice president of products and marketing at Kyvos Insights, delivering multi-dimensional OLAP solutions that run natively on Hadoop. Ajay has more than 20 years of experience in marketing, product management and development in the areas of big data analytics, storage and high availability clustered systems.
Prior to Kyvos Insights, he was founder and vice president of products at Datameer, delivering the first commercial analytics product on Hadoop. Before that he was director of product management at Yahoo, driving adoption of the Hadoop based data analytics infrastructure across all Yahoo properties. Previously, Ajay was director of product management and marketing for SGI’s Storage Division. Ajay has also held a number of marketing and product management roles at Sun, managing teams and products in the areas of high availability clustered systems, systems management and middleware.
Ajay earned an M.B.A. and an M.S. in computer engineering from the University of Texas at Austin, and a BSEE from the Indian Institute of Technology.
Follow ODBMS.org on Twitter: @odbmsorg