On Database Performance. Q&A with Dirk Beerbohm

Q1.  Database query performance is an important metric. In one of your recent blogs, you argued that this is not enough, “resulting in complicated implementations and high operational efforts.” What do you mean by this? 

Focusing solely on raw query execution speed neglects broader requirements. Upscaling of a database does not only reflect improving query performance or scaling up user numbers. One metric which is often forgotten is the ability to scale up the database regarding new use cases, and especially future requirements. Implementing a major database platform is nothing you do frequently. You need to ensure that your database will most likely be able to fulfill future functional requirements. Exasol was technically able to deploy and utilize Large Language Models when those were not already invented. The same situation accounts for future new trends and use-cases. If a database can prove that it works with widely used programming languages, you should be on the safe side. 

Q2. What is the difference between query performance and operation costs? 

Query performance strictly measures execution time, typically latency or throughput for database operations. Operational costs include broader metrics: infrastructure costs (CPU, memory, storage usage), personnel efforts (DBA workload), complexity of scaling, ongoing optimization, and total time spent maintaining the database system. 

Q3. What are the challenges when a customer wishes to add new functionality to an existing database? 

Introducing new functionality into existing databases typically faces several technical hurdles: 

  • Integration complexity: Compatibility with existing query optimizers, execution plans, and storage engines. 
  • Performance overhead: Ensuring new functionality does not negatively impact existing query performance. 
  • Maintenance burden: Increased complexity in managing database dependencies, upgrading, and ensuring backward compatibility. 
  • Security and stability risks: Ensuring new functionalities adhere strictly to security standards and don’t introduce vulnerabilities or instability. 

Q4. Are User Defined Function (UDF) the solution for this? 

UDFs address this by enabling customized, encapsulated logic execution within the database engine, significantly simplifying integration of new functionality. They offer: 

  • Controlled extension points for database capabilities. 
  • Simplified encapsulation of business logic or custom computations directly within SQL queries. 

However, UDFs must be carefully managed regarding security, performance isolation, and scalability. They must be integrated robustly to ensure consistent resource management and minimize potential performance degradation or security issues. 

Q5. At Exasol, you have natively integrated the Lua (https://www.lua.org) script language into the database. What is Lua? And why did you choose it? Why not use Python instead? 

Lua is a lightweight, high-performance, embeddable scripting language explicitly designed for integration into host applications. Lua offers: 

  • Minimal memory footprint and extremely fast runtime performance, ideal for database environments. 
  • Simple, well-defined C API for efficient embedding and secure sandboxing. 
  • Low overhead, facilitating isolated execution contexts. 

We selected Lua due to its optimal blend of performance, embeddability, minimal execution overhead, and security, and therefore integrated it as a native language into the database. It is the best option regarding performance to create database scripts or
User-Defined-Functions. 

Q6.  What about containerization? Is a script language container the solution? 

Containerization offers isolation, portability, and repeatability for script execution environments, effectively solving package dependency management and enhancing security isolation. A containerized script runtime environment can isolate user-defined functions from the core database process, preventing dependency conflicts and system-level interference. 

However, containerization may introduce additional overhead—start-up latency and resource usage—potentially impacting performance. At Exasol, we employ lightweight, sandboxed script execution environments to balance isolation, minimal overhead, and performance rather than full OS-level containerization for each script execution. Moreover, an activated script container does not consume system resources in standby mode unless actively used. You have the option to activate a script language container for the entire database or per session, which makes it ideal to separate containers for different use cases or for organizational reasons. While Lua is natively integrated into the Exasol database and therefore static in its capabilities, the container approach introduces Python, Java and R as language alternatives that can easily extended, making it a good choice for implementing user-defined-functions beyond the capabilities of Lua. 

Q7. How do you execute user-defined functions in parallel using your MPP architecture? 

In Exasol’s MPP (Massively Parallel Processing) architecture, customer data is distributed over all active nodes of an Exasol database cluster, having each node storing a unique portion of the overall data; the Script Language Containers reside on all active nodes. UDFs execute in parallel on all active nodes, working on the distributed data: 

  • Each node independently executes the UDF on its local data partition concurrently. 
  • Results from each node are then aggregated and combined at query execution finalization. 

This parallelized, distributed execution leverages Exasol’s in-memory processing engine, ensuring minimal latency and maximal throughput, enabling linear scaling of UDF processing with additional nodes. 

Q8. What are the benefits? Are there any problems with guaranteeing data consistency? 

Benefits of parallel UDF execution include: 

  • Linear scalability: Performance scales predictably as cluster resources expand. 
  • Improved throughput: Concurrent execution of compute-intensive operations on partitioned datasets. 
  • Reduced latency: Leveraging parallel in-memory execution shortens time-to-result significantly. 

Regarding data consistency, challenges typically arise from concurrent write operations or side effects. In Exasol, UDF execution contexts are isolated per node, ensuring read-only data consistency is inherently preserved. For cases requiring write access or stateful operations, additional synchronization or transaction controls must be explicitly managed at the application or database execution layer. 

Q9. What happens if a script runtime environment on the host is compromised? 

Exasol strictly sandboxes each script runtime environment. If compromised: 

  • The attacker’s reach is strictly confined to the isolated runtime sandbox environment, significantly limiting potential damage. 
  • Sandboxing mechanisms include strict file-system access restrictions, resource constraints (CPU, memory limits), and absence of elevated OS privileges. 
  • Compromised runtimes are isolated from the core database engine, ensuring operational stability and security containment. 

Additionally, Exasol incorporates comprehensive security audits, runtime monitoring, and execution policies to detect anomalies rapidly and isolate compromised environments effectively. 

Q10. Is there a risk that packages for the user-defined functions interfere with the script language environment of the underlying OS, risking potential misbehavior of the OS? 

No, because Exasol’s embedded scripting environments are strictly sandboxed and isolated from the underlying OS-level script environments. We use virtualized runtime context, which prevents UDF packages from affecting system-wide script installations or OS-level scripts. Dependency conflicts or runtime modifications remain isolated strictly within the database scripting sandbox, preventing any interference or misbehavior at the OS level. 

Q11. Anything else you wish to add? 

I’d emphasize that extensibility through embedded scripting and UDFs significantly enhances the flexibility and computational power of analytical databases. However, embedding must balance security, performance, and ease of operation carefully. At Exasol, extensive engineering efforts ensure that extending database functionality via parallel execution does not compromise operational simplicity, stability, or security, maintaining high performance and reliability for enterprise analytical workloads. 

…………………………………………

Dirk Beerbohm, Global Partner Solution Architect at Exasol.

Dirk Beerbohm, a Global Partner Solution Architect at Exasol, supports global partners with over 30 years of expertise in data analytics. Specializing in structured, unstructured, and hybrid data, he focuses on machine learning and AI, particularly large language models. As Lead Sales Engineer for ML and AI, he bridges R&D and partner/customer needs. You can connect with Dirk on LinkedIn.  

Sponsored by Exasol

You may also like...