On Machine Learning in SQL. Q&A with Jeff Fried
Q1. Does it make sense to offer Machine Learning (ML) features in SQL?
Absolutely! SQL is powerful and is in use in millions of data-intensive applications. Because data is so vital, SQL is also among the most in-demand programming languages. Since developers are already using SQL in their applications, it is natural to give them the ability to tap into ML directly from SQL.
Q2. You recently announced QuickML™. What is it? What is it useful for?
QuickML is a new capability allowing application developers to easily add automation and predictions within their existing applications directly from SQL. It simplifies the process of building, testing, and deploying ML models, and speeds the process of integrating them into production applications.
There’s two parts to QuickML. The first is the SQL syntax and facilities that an application developer sees. The second is an automated ML (AutoML) engine that is called behind the scenes. We’re including one AutoML engine with InterSystems IRIS, so that developers can get started easily and build predictive models straight off the data in their SQL table or materialized view, and call them from their SQL-based applications, without requiring any specific knowledge on Feature Engineering or ML algorithms. We’ll also be providing integrations with other AutoML engines and making QuickML pluggable, so that data scientists can easily add more precision and power to these applications as they grow.
QuickML ultimately allows all developers using the InterSystems IRIS Data Platform™ to embed ML capabilities in their applications in a simple and scalable format.
Q3. Are you expecting that people will use QuickML with existing applications, or with new ones?
Yes (to both).
We are certainly focused on the scenario of adding ML to existing applications. InterSystems IRIS is used for a very wide range of applications, and most of them have areas that could take advantage of ML. So application first, ML second. Many of these are actually “boring” from an ML perspective; relatively simple ML with standard algorithms can provide a lot of benefit. And you have the benefit of having production data in place – often a lot of it that is already relatively clean and in a well understood forma.
Will people use QuickML on net new applications? Absolutely. If you have it from the start, you can build powerful applications easily. In any case, you still want to focus first and foremost on the problem you’re solving and your access and understanding of data rather than on ML for its own sake.
Q4. How do you mitigate the risk that predictions made with QuickML™, without ML expert engineers, go wrong and therefore cause harm?
At one level this is a risk that’s part of all ML, and even a risk that’s part of all software – though perhaps greater with ML than traditional algorithms because ML is a different approach that is less directly controllable. We mitigate this by pre-packaging an AutoML engine that is focused on simplicity and has some ‘guard rails’. We advise customers to start simple, for example by adding predictions that aid with human judgement.
It’s also important to note that we are not aiming to eliminate ML expert engineers. Nor would we expect professional application developers to, for example, attempt to build a driverless car without any ML specialists – using QuickML or any other ML facility. We’re focused on helping application developers add simple ML very easily. As applications built with QuickML become more sophisticated or more mission-critical, ML engineers will be in the picture – but they will be more productive and spend more time on the actual model optimization rather than the data wrangling or application integration.
Q5. How do you help developers measure how “accurate” their predictions are made using QuickML?
SQL is a big help here. In this context, you already have the data and you’ve picked a column which you want to predict. Where that column is already populated, it provides both a training set and a test set. So you can TRAIN MODEL on, for example, the top N non-Null items, then PREDICT on the rest of the non-Null items, and compare the predicted and actual values. We’ll have examples that help developers, as well as things like exposing confidence values from the algorithms that provide them.
Q6. You offer, besides QuickML, a Spark Connector, and a Predictive Model Markup Language (PMML) runtime engine. How do they all fit together?
These are three independent features that can each be used separately, or can be used together in combination. A simple way to look at it is that we provide facilities both for experts and non-experts.
The Spark connector and PMML runtime engine are often used together in situations where there are expert data scientists doing exploration and then developing a model, around which an application is built. We give the data scientists secure, governed access to data at high performance, and they can work in any tool of their choice, ultimately producing a model that is imported for runtime use. QuickML is meant for application developers, not expert data scientists, and is a different facility.
In addition to ML, there’s good old HL – human learning – at play here. We are learning continually from our customers. After fielding the PMML engine, we learned that customers wanted to do more complex data science directly on the data – which resulted in the Spark connector. That has been successful, but we learned that most of our customers didn’t have the staff and data science focus to take advantage of it – they needed a non-expert solution, hence QuickML. We have ideas about how to use these in combination, but I expect that we’ll build a better solution by learning how people combine these in practice.
Q7. When will QuickML be available in InterSystems IRIS as a native ML capability?
QuickML will be available in early 2020.
Qx Anything else you wish to add?
I’m very excited at how much interest and positive feedback there is in QuickML so far. Can’t wait to see what cool things people build with it!
Jeff Fried, director of product management for InterSystems, is a long-standing data management nerd, and particularly passionate about helping people create powerful data-driven applications. Prior to joining InterSystems, Jeff served as CTO of BA Insight, Empirix and Teloquent, and ran product management for FAST Search and Transfer and for Microsoft. He has extensive experience in data management, text analytics, enterprise search and interoperability. Jeff is a frequent speaker and writer in the industry, holds 15 patents, and has authored more than 50 technical papers and co-authored three technical books.
Sponsored by InterSystems.