Looking beyond the DBMS: Towards Holistic Performance Optimization for Enterprise Architectures
Looking beyond the DBMS: Towards Holistic Performance Optimization for Enterprise Architectures
By Dr. Alexander Boehm, database architect working on SAP´s HANA in-memory database management system.
Performance optimization and the creation of highly efficient systems has always been a major focus of the database community. In academia, this is reflected by a vast number of publications that describe sophisticated techniques for all major components and aspects of a DBMS. Recently, the DBMS community broadened its scope towards hardware and operating systems. This means no longer “just” looking on how to optimize the DBMS itself, but to go even further by a close integration with the OS, or even doing hardware/software co-design e.g. by exploiting vector instructions for efficient scans in in-memory databases such as HyPer or SAP HANA.
Did we miss the point so far?
In the overall context of today’s enterprise architectures, there is component that has been overlooked to a large extend: The application server. Running the business logic (e.g. written in Java, ABAP, or another high-level language), application servers play an important role and often are the elements where most of the processing time is spent. In various analyses, SAP’s performance and scalability teams came to the conclusion that up to 70% of the overall processing effort of enterprise application is spent on the application server (see Figure 1).
Figure 1: Load distribution in an Enterprise Application (© SAP SE, Performance / Scalability Team)
Consequentially, it seems that optimization of the interplay between application server and DBMS could be another fruitful area for the optimization of enterprise architectures: If it is possible to reduce the processing time in the application server by a tighter coupling and close collaboration with the database management system, the overall processing effort and application speed as observed by the end user will improve.
Potential Optimizations Areas for Application Server/DBMS Co-Design
Over the last years, we have investigated several areas where an application server/DBMS co-design can bring a significant benefit. In the following sections, we will give several examples and roughly sketch the improvements that could be achieved in the context of prototypical implementations.
Type Alignment
In most cases, the type system of the application server and the DBMS do not match. (A prominent example is the NULL value, which provides three-valued logic in the DBMS, but is usually not supported by application servers and programming languages.) By aligning the type systems, data conversion steps in the application servers DBMS client (e.g. ODBC or JDBC) can be omitted, thus reducing the processing effort whenever interacting with the database.
Efficient Data Transfer
A next step after type alignment is to further improve the efficiency of data transfer between the two systems.
In a prototype, we made the DBMS aware of the in-memory data layout of the application server. When creating result sets, the DBMS directly materializes to this layout, and data transfer to the application server basically becomes a remote memcpy operation (that can even be further accelerated by e.g. remote direct memory access (RDMA)). In our prototype, this gives a speedup of about 20% in transferring data chunks from the DBMS to the application server, and back.
Transporting Semantics
Even higher gains can be achieved when making DBMS aware of the semantics of an application construct. An example is the FOR ALL ENTRIES construct in the ABAP language. This construct is used to retrieve a set of records (identified by a list of primary keys) from the database. Typically, this intent is expressed by a SQL statement with many conjunctions (for each key attribute) and disjunctions (for each key the tuple should be materialized).
Example:
SELECT * FROM TABLE
WHERE (key_attribute_1=? and key_attribute_2=? and key_attribute_3=? and …) OR
(key_attribute_1=? and key_attribute_2=? and key_attribute_3=? and …) OR
(key_attribute_1=? and key_attribute_2=? and key_attribute_3=? and …) OR
…
The intent of this statement is basically to do a semi-join between a database table with a “table” in the application server containing a list of keys – however, it is not trivial for the DBMS to reverse-engineer this from the (huge) SQL statement depicted above. In the context of a research prototype, we made the DBMS aware of the semantics of the FOR ALL ENTRIES construct, directly calling a semi-join instead of creating a complex SQL statement. Depending on the number of keys retrieved and the number of key attributes, this resulted in significant performance gains, up to a factor of 10 and more.
Batching / Group Processing
In the context of online transactional processing (OLTP) applications, the DBMS is often hit by a high volume of rather simple and short-running statements, such as primary-key based SELECTs or key-foreign key joins with a high selectivity. In an enterprise context, performance requirements in the range of ten-thousands of requests per second, or even more, are not uncommon.
An idea to improve the overall efficiency of the enterprise system is to extend DBMS-techniques, in particular group commit, to the application server: By delaying and grouping together requests on the application server, the number of roundtrips to the DBMS can be reduced, and latency can be traded for throughput. This idea also enables subsequent optimizations on DBMS side, e.g. by executing common subqueries only once (i.e. if transactional visibility matches).
In the context of another prototype, we could observe significant throughput gains by using this idea (up to factor of 5 improvements for simple, read-only scenarios).
Conclusion
We believe that application server/DBMS co-design can help to significantly improve the performance and efficiency of today’s enterprise architectures. The examples given above are only a first stepping stone in this direction and hopefully will fuel subsequent research both in academia and industry.