On Large Language Models (LLMs) and Databases. Q&A with Madeleine Corneli

by Roberto Zicari · March 31, 2025

Q1. You believe in collaboration, working closely with engineering, data science, marketing, and customers. How does it work in practice? What are the challenges you face in bringing different stakeholders together?

Collaboration across technical disciplines at Exasol involves regular sprint planning, architecture reviews, and technical workshops. We align engineering, data science, and product teams through clear API specifications, detailed architectural diagrams, and shared JIRA backlogs. With customers, we facilitate iterative feedback loops via dedicated pilot deployments and beta programs. A major technical challenge is managing dependencies and ensuring consistency across development environments. We mitigate this by implementing containerized development (Docker, Kubernetes) and strict CI/CD pipelines for reproducible builds and deployments, minimizing integration friction between teams.

Q2. What are the main benefits in integrating Large Language Models (LLMs) into an enterprise`s analytics?

Integrating LLMs into enterprise analytics enables advanced capabilities such as natural-language querying, semantic search, automatic SQL generation, text classification on large, complex datasets and much more. LLM integrations such as natural language SQL conversion can lower the analytics barrier to entry by converting natural language directly into optimized SQL statements. LLMs can also be used to intelligently and efficiently extract insights from unstructured data at scale – a common enterprise data challenge. LLMs and transformers frameworks have numerous other enterprise analytics applications – all of which can help boost user efficiency, increase insights and create more analytic value.

Q3. And what are the challenges?

The three main challenges are: technical complexity, infrastructure requirements and security/risk management. Gen AI systems require subject matter expertise – while this knowledge is growing, many organizations are still early in developing and nurturing this talent. Many Gen AI computations have high resource intensity due to the parameter scale of modern LLMs (often billions of parameters). Running Gen AI solutions can require significantly higher CPU or dedicated GPU resources, both of which are costly. Finally, risk and security management is a critical challenge. This means mitigating model hallucination and maintaining data accuracy, via rigorous validation methods, reinforcement-learning-with-human-feedback (RLHF) fine-tuning, and systematic human-in-the-loop supervision. Additionally, data privacy and compliance with regulations (GDPR, HIPAA) mandate sophisticated masking, tokenization strategies, and secure model-serving environments.

Q4. It is now possible to translate human language into SQL statements based on a Large Language Model (LLM). What do you think about it? Is this a useful feature? For whom?

Natural language to SQL translation via LLMs is highly beneficial, especially for domain experts without deep SQL knowledge. It allows users to express queries conversationally (e.g., “List the top-selling products from last quarter in EMEA excluding Germany”), automatically generating syntactically correct and performant SQL statements, even when involving complex JOIN operations, nested subqueries, or window functions. This feature is particularly valuable to analysts, business intelligence teams, and operational users needing rapid, ad-hoc queries without extensive SQL training. Now – it’s important to consider the guardrails necessary when expanding access to analytical tools. Since less technical users may leverage Text2SQL solutions, we need to ensure their questions and the answer they receive are accurate, to avoid spreading incorrect information.

Q5. Did you try with the Exasol Analytics Engine?

Yes, we implemented natural-language-to-SQL translation by downloading LLMs (from HuggingFace) onto Exasol’s local file store and then accessing those in a custom function that offers an agent-like experience. This solution leverages Exasol’s column-oriented, MPP (Massively Parallel Processing) architecture and automatic query optimizer, we observed consistently efficient query execution, even for complex, auto-generated queries involving multiple table joins, aggregation, and analytical window functions. Performance benchmarking demonstrated minimal overhead with rapid response times due to Exasol’s parallelized query execution and optimized memory caching. It is possible to deploy this entire solution within Exasol; maintaining such a closed system increases security and governability.

Q6. Please tell us about Exasol’s AI-Lab. What is it? And what is it useful for?

Exasol’s AI-Lab is a sandbox environment built on our database platform, providing data scientists and ML engineers with a fully integrated framework for developing, validating, and deploying machine learning models directly alongside their analytical data. AI-Lab supports popular ML frameworks (TensorFlow, PyTorch, scikit-learn), and includes native connectors to enable real-time querying and model inference directly against in-memory datasets. It enables experimentation with feature engineering, hyperparameter tuning, model version control, reproducibility, and facilitates efficient transition from prototyping to scalable deployment.

Q7. Is it possible to test the Exasol’s AI-Lab?

Yes. Technical teams and data science units can download and install Exasol’s AI-Lab for free within their own ecosystem. Exasol AI Lab includes comprehensive documentation, Jupyter notebook integration, and pre-loaded reference implementations for common ML workflows (classification, regression, NLP, forecasting). Users can evaluate performance, integration ease, and validate their models’ accuracy and scalability prior to production deployment.

Q8. Utilizing an LLM has certain advantages and disadvantages. What is your take on this?

LLMs offer superior capabilities for NLP tasks including intent extraction, query generation, and text classification due to their contextual embeddings and semantic understanding. However, they exhibit limitations like hallucinations—generating plausible yet incorrect answers—necessitating strict model validation, fact-checking procedures, and rigorous fine-tuning. Computational complexity and associated costs are substantial due to the model’s size and GPU/TPU requirements. Enterprises thus must deploy strategic approaches such as model quantization, knowledge distillation, efficient inference engines, and controlled retraining cycles to balance performance and cost efficiency.

Q9. Let’s talk about integrations and ecosystems. What are the main challenges in performing a seamless integration of various AI tools and databases?

Seamless integration faces technical challenges including schema compatibility, managing varying data serialization standards (Avro, Parquet, JSON), latency minimization, and ensuring secure data transfer. Orchestrating hybrid or multi-cloud environments demands sophisticated solutions like containerization (Docker/Kubernetes), distributed computing platforms (Apache Spark), and standardized APIs (RESTful services, ODBC/JDBC). Exasol addresses these challenges with Virtual Schemas, extensive connector support, a fully parallelized query engine, and API-driven integration, ensuring efficient data throughput and low latency even across complex, heterogeneous environments.

Q10. DataRobot is a platform that supports data science by automating the end-to-end process of building, deploying, and maintaining machine learning (ML) and artificial intelligence (AI) at scale. Is it possible to automate machine learning with DataRobot and Exasol? If yes, how?

Yes, automation of machine learning workflows at enterprise scale is feasible by integrating DataRobot’s AutoML capabilities with Exasol’s high-performance Analytics Engine. Exasol provides the engine behind Data Robot, offering “SQL pushdown” capabilities to speed up model development and iteration. This involves leveraging optimized ODBC/JDBC connectors or REST APIs for data extraction directly into DataRobot’s platform, where automated model training—including feature selection, hyperparameter optimization, model validation—is executed. Post-training, optimized models can be deployed back into Exasol for high-performance inference operations leveraging Exasol’s parallelized computation and in-memory caching. This integrated approach allows enterprises to streamline the ML lifecycle while benefiting from high query performance, scalability, and real-time analytics execution in Exasol.

……………………………………………………………..

Madeleine Corneli

Madeleine Corneli leads product development for AI and ML at Exasol. She is focused on unlocking critical AI and ML use cases for customers and expanding Exasol’s suite of capabilities. She focuses on classic machine learning and generative AI applications across all industries. Madeleine has considerable experience in the analytics space and understands deeply how data can help empower people and solve problems. You can follow Madeleine on LinkedIn.

On Large Language Models (LLMs) and Databases. Q&A with Madeleine Corneli

You may also like...

Resources

Search

News

Events

Archives

Sponsored By

InterSystems

MySQL/Oracle

Supporters

McObject

Raima

Scality

TIAA

Undo

Volt Active Data