Using NoSQL at BMW. Interview with Jutta Bremm and Peter Palm.
“We need high performance databases for a wide range of challenges and analyses that arise from a variety of different systems and processes.”–Jutta Bremm, BMW
Q1. What is your role, and for what IT projects are you responsible for at BMW?
Jutta Bremm: I am IT Project Leader for IT projects at BMW with a volume of more than 10 million Euro per year.
Q2. What are the main technical challenges you have at BWM?
Jutta Bremm: We need high performance databases for a wide range of challenges and analyses that arise from a variety of different systems and processes.
These don’t only include recursive, parameterized explosions for bills of materials, but also the provision of standardized tools to the business departments. That way, they can run their own queries more often and are not so dependent on IT to do it for them.
Q3. You define CortexDB as a schema-less multi-model database. What does it mean in practice? What kind of applications is it useful for?
Peter Palm: In CortexDB, datasets are stored as independent entities (cf. objects). To achieve this, the system transforms all content into a new type of index structure. This ensures that every item of content and every field “knows” the context in which it is being used. As a result, the database isn’t searched. Instead, queries are run on information that is already known and the results are combined using simple procedures based on set theory.
This is why there’s no predefined schema for the datasets – only for the index of all fields and the content.
This is what differentiates CortexDB from all other databases, which require the configuration of at least one index even though the datasets themselves are stored in schema-less mode.
The innovative index structure means that no administrative adaptation or optimization of the index is necessary.
Nor is there any requirement for an index for a specific applications – and that enables users to query all the content whenever they want and combine queries with each other too. That makes it very flexible for them to query any field and easily make any necessary development changes to in-house applications.
From the server’s perspective, the fields and content, as well as the interpretation of dataset structure and utilization, are not that important. The application working with the data creates a data structure that can be changed at any time (this is known as schema-less). For CortexDB, all that’s relevant is the content-based structure, which can be used in a generalized way and modified any time. This design gives customers a significant advantage when working with recursive data structures.
This is why CortexDB is particularly well suited to tasks whose definitive structure cannot be fixed at the beginning of the project, as well as for systems that change dynamically. The content-based architecture and the innovative index also deliver significant benefits for BI systems, as ad hoc analyses can be run and adapted whenever required.
In addition, users can add a validity period (“valid from…”) to any item of content. This enables them to view the evolution of particular data over time (known as historization). This evolutionary information is ideal for storing data that change frequently, such as smart metering and insurance information. For each field in a dataset, users don’t only see the information that was valid at the time of the transaction, but also the validity date after which the information was/is/will be valid. This is what we call a temporal database.
These benefits are complemented by the fact that individual fields can be used alone or in combination with others and repeated within a dataset. This – together with the use of validity dates – is what we call a “multi-value” database.
The terms “multi-model”, “multi-value” and “schema-less” also explain the fact that benefits of the database functions mentioned above apply to other NoSQL databases too, but users can extend these with new functions. In principle, any other database can be seen as a subset of CortexDB:
Database type: Key/Value Store
Function: One dataset = one key with one value (a value or value list) => a single, large index of keys
How it works in CortexDB: Every value and every field is indexed automatically and can be freely combined with others by using an occurrence list
Database type: Document Store
Function: One dataset combines several fields using a common ID (often json objects)
How it works in CortexDB: One ID combines fields that belong together in a dataset. Datasets can be output as json objects via an API.
Database type: GraphDB
Function: Links to other datasets are saved as meta information and can be used via proprietary graph queries.
How it works in CortexDB: Links are stored as actual data in a dataset and can be edited using additional fields. Fields can be repeated as often as required.
Database type: Big Table
Function: Multi-dimensional tables that use timestamps to define the validity of information. Its datasets can have a variety of attributes.
How it works in CortexDB: The use of a validity date in addition to a transaction date delivers a temporal database. Additional content can be added despite the dataset description.
Database type: Object oriented
Function: A class model defines the objects that need to be monitored persistently.
How it works in CortexDB: With the Cortex UniPlex application, users can define dataset types. Compared with classes, these define the maximum attributes of a dataset. Nevertheless, users can add more fields at any time, even if they have not been defined for UniPlex.
Q4. Can you please describe the use cases where you use CortexDB at BMW?
Jutta Bremm: The current use case for which we’re working with CortexDB is the explosion of bills of material for the configuration of test vehicles.
The construction of test vehicles must be planned and timed just as carefully as with mass production. To make the process smoother, we conduct reviews before starting construction to ensure that the bills of materials include the right parts and are therefore complete and free of any errors and conflicts.
One thing I’d like to point out here is that every vehicle comprises 15,000 parts, so there are between 10 to the power of 30 and 10 to the power of 60 configuration possibilities! It’s easy to understand why this isn’t an easy task. This high variance is due to the number of different models, engine types, displacements, optional extras, interior fittings and colors. As a result, a development BOM can only be stored in a highly compressed format.
To obtain an individual car from all this, the BOM must be “exploded” recursively. Multiple parameters have an effect on this, including validities (deadlines for parts, products, optional extras, markets etc.), construction stipulations (“this part can only be installed together with a navigation device and a 3-liter engine”) and structures (“this part is comprised of several smaller parts”).
Unlike conventional solutions, for which an explosion function is complex and expensive, the interpretation of the compressed BOM is very easy for CortexDB due to its bidirectional linking technology.
Q5. Why did you select CortexDB and not a classical relational database system? Did you compare CortexDB with other database management systems?
Jutta Bremm: We were looking for a product that would be easy to use, as well as simple and flexible to configure, for our users in product data management. We also wanted the highest possible level of functionality included as standard.
We looked at 4 products that appeared to be suitable for use by the departments for analysis and evaluation. The essential functions for product data management – explosion and the documentation of components used – were only available as standard with CORTEX. For all other products, we were looking at customer-specific extensions that would have cost several hundred thousand euros.
Q6. How do you store complex data structures (such as for example graphs) in CortexDB?
Peter Palm: CortexDB sees graphs as a derivative of certain database functions.
Firstly, it uses the “internal reference” field type (link). This is a data field in which the UUID of a target dataset is stored. That alone enables the use of simple links.
Second, users can choose to define fields as “repeating fields”. That means that the same field can also be used within a dataset. This is useful when a contact has more than one email address or phone number, and for links to individual parts in a BOM.
Repeating fields defined in this way can be grouped together to produce “repeating field groups”. Content items that belong together are thus stored as an information block. An example of this is bank account details that comprise the bank’s name, the sort code and the account number.
The use of repeating field groups, in which validity values are added to linked fields, enables complex data structures within a single dataset.
In addition, every dataset “knows” which other dataset is pointing to it. This bidirectional information using a simple link means that data administration is only required for one dataset. It is only necessary in both datasets if there are two conflicting points of view on a graph (e.g. “my friend considers me as an enemy”).
In addition, result sets can be combined with partial sets resulting from links when running queries and making selections. This limits the results to those that include certain details about their link structures.
Q7. How do you perform data analytics with CortexDB?
Peter Palm: The content in every field “knows” the field context it is being used in and how often (“occurrence list” or “field index”). By combining partial sets (as in set theory), result sets are determined extremely fast, eliminating the need for read access to individual datasets.
CortexDB comes with an application that lets users freely configure queries, reports and graphical output. There is also an application API (data service) that enables these elements to be used within in-house applications or interfaces.
The solution also identifies correlations itself using algorithms, even if they are connected via graphs. Unlike data warehouse systems, this lets users do more than just test estimates or ideas – it determines a result on its own and delivers it to the user for further analysis or for modification of the algorithm.
Q8. Do you some performance metrics for the analysis of recursive structured BOMs (bill of material) for your vehicles?
Jutta Bremm: Internal tests on BOM explosion with conventional relational databases showed that it took up to 120 seconds. Compare that with CortexDB, which delivers the result of the same explosion in 50 milliseconds.
Q9. How do you handle data quality control?
Jutta Bremm: We require 100% data quality (consistency at all times) and CortexDB delivers that.
Q10. What are the main business benefits of using CortexDB for these use cases?
Jutta Bremm: The agile modeling, the flexible adaptation options and the level of functionality delivered as standard shortens the duration of a project and reduces the costs compared to the other products we tested (see Q5).
Qx. Anything else you wish to add?
Jutta Bremm, Peter Palm: By using the temporal capabilities (time of transaction and time of validity), users can easily see which individual value in a dataset was/is/will be valid and from when.
Jutta Bremm, IT Project Manager, BMW.
Jutta is a IT Project Leader at BMW in product data management since 1987.
She was involved in IT projects at Siemens, Wacker Chemie, Sparkassenverband since 1978.
Peter Palm, Chief Visionary Officer (CVO) at Cortex.
Started CortexDB development in 1997.
Holds a Master in electronic engineering.
Area of expertise: Computer hardware development, Chip design, Independent Design Center for Chip Design (Std-Cell, Gate Array), Operating system development, CRM development since 1986.
Follow ODBMS.org on Twitter: @odbmsorg