NoSQL meets Industrial Mass Data
by Timo Klingenmeier, inmation GmbH & Co. KG
October 31, 2014
I am writing this whitepaper at the near end of a two-year software product development cycle. It is not about the joy (and sometimes) pain my team and me were having throughout this endeavour. It is not about how great our new product is, we leave it up to our customers to decide.
It is about NoSQL, being consequently used as the general database technology underneath our real-time architecture. system:inmation collects data from a huge variety of purely industrial systems – Devices, Control Systems, HMI, SCADA, you name it. It also collects data from generic ‘not-so-real-time systems’, such as classical relational databases or structured files. It finally consolidates the different data and turns everything into a large, multi-server OPC Unified Architecture model, historizes all kinds of structures and exposes this consolidated namespace in a secure fashion to third-party applications on Enterprise IT level.
We have chosen MongoDB to be the general purpose database product to make anything persistent what our Core Service processes – time-series data up to 10Hz acquisition rates, statistical aggregates, namespace objects, properties and attributes, alarms, events, logs, audit trails, process image. Quite a few of different data structures, different document sizes, nesting structures, specific storage and retrieval requirements, and the ultimate need to… scale! Out, not up.
Industrial data usually comes in numbers. Think about owning a thousand wind turbines, where each operates on approximately 2000 operational data points. Running an oil refinery with 20.000 control loops. So, thousands, hundred thousands, millions of qualified and name-designated information items you have to handle. Internet of Things and Industry 4.0 major trends are driving the data scope by the day. In the past, mainly scale-up architectures were offered, mostly using proprietary storage formats locking the customer into the single vendor API. Data silos and islands have been created, hard to manage, costly to integrate.
Today, the informed workforce (and their BI systems) requires access to fine-grained data for nearly each and every decision-support process, and this is not only affecting classical heavy industries. How can hundreds of data scientists access millions of data streams and years of fine-grained history in a fast, secure and uniform manner? Hint: Deploy a true, horizontally scaled-out and open system.
To us, the decision to use NoSQL/MongoDB for any kind of historization and data persistency has already paid back. We have the flexibility, simplicity and performance which we envisioned at start – plus, we were able to inherit BSON for all of our internal communications. How great is that, a general-purpose database product which comes with object-ready transport in one open source box?
Which does up to 100.000 sustained document inserts per second in a single instance? Which delivers the data back from terabyte collections in no time for trending? Which also stores complex structures to virtually any nesting level in a single document?
I am a fan. A true NoSQL believer. After 25 years (successful but painful) quenching objects in and out of SQL and proprietary structures, our time has come. We implement new functionality instead of zero-value mapping layers, we have reduced database entities by 90%, we code-generate object descriptions including their serialization mechanics for the TCP wire and the database in one go, and thus, we have satisfied developers who can focus on the real thing.
NoSQL databases have been invented because the Googles, Facebooks and Amazons of this world could not efficiently run their business with our Dad’s database technology. Now, it is time to introduce the new kid back to the world where Big Data was invented long before the term even existed: to the highly-automated industries and all their data born on the production floor.