The Ludic Fallacy, and what it means for your data
By David Rolfe
Some of you will be aware of the concept of a “Black Swan Event”. The term was coined by Nicholas Taleb, and refers to a rare event with significant consequences. Something so unlikely or statistically implausible that people simply refuse to accept that it could ever happen, until it does, at which point they decide that, with hindsight, it was obvious.
But Taleb also coined another idea, which is highly relevant to anyone who works with large quantities of data for a living.
The Ludic Fallicy is “the misuse of games to model real-life situations”. While Taleb sees it as applying to how people create statistical models that fail in the real world because the model didn’t account for one or more unlikely events, I’d argue it’s highly relevant to data professionals. We’ve been dealing with Ludic Fallacy for about fifty years, but have never called it out.
Unless you have a very high level of trust in LLMs, in order to process data in a computer you need to put it into a structured format, consisting of fields of defined data types, which are collected into records. There are rules about not having duplicate records (primary keys), and how you refer to a record inside another record (foreign keys), and so on.
Think about it! When you create a database schema you are gamifying reality. You are creating a model of a subset of the real world. That model has built in structural rules and baked in assumptions. We’re in Ludic Fallacy territory.
Before the invention of the RDBMS we used structured data formats, but limitations in the available technology had almost as much influence over the design as the real world system we were trying to model. One of the great selling points of the RDBMS was that for the first time you could make an entity model that accurately reflected reality. If only!
The problem is that reality is fractal in nature, and if you see complexity and decide to look closer to try and understand it, all you see is more complexity. As a result, entity modelling and database design went down a road to ever increasing complexity. There was no problem that could not be solved by splitting one entity into three new ones, and we ended up with databases with hundreds of tables. As a general rule the only person who knew how to use the system was its creator, and sometimes not even then! I’ve worked with, and sadly helped create such systems. Every primary key is five or more columns long. Or a system generated ID. Some tables only consist of a series of columns, each containing a system generated ID. It’s ‘GUIDs all the way down’. Such systems are painful to work with. Instead of basic CRUD operations, you find yourself inserting three rows, in case one time in fifty thousand you need to insert four. In addition, most of the tables now refer to abstract concepts, things that don’t exist in real life.
To summarise:
Reality is so complicated that models which try to cover everything are usually impractical.
The alternative is to ‘make do’ with a simpler model. Let me give you an example. Consider a table called SHIPMENTS. In theory it stores every physical part we shipped to a customer. Only some of the part numbers are ‘special’. They have a contract number tacked on the end. It’s the same part, but in the context of a specific contract for the government. And instead of adding a ‘contract_id’ field and including it in the Primary Key they just messed with the part number. We can’t fix this, because another division sold the part and created the record. Oh, and while we’re talking about ‘other divisions’ some of our shipments are distinctly strange. They don’t weigh anything and have no warranty period. It turns out they are internal financial transfers used by the parent company to keep score. No, we can’t get rid of them, as they are expecting to see them. This is a real world situation I have directly dealt with.
This just goes to prove the point:
The concept of “Ludic Fallacy” implies that no corporate data model perfectly reflects reality.
Can I solve this problem with a document database?
The same phenomenon rears its head in more or less every data driven application. Document databases are better at handling it, but are not immune. Developers love the concept, as they no longer have to argue with pesky DBAs over how to store things. But those DBAs were doing their job, which was to try and define and enforce a single corporate standard for how a given real world object is represented inside the computer system. So a developer tweaks the definition so it allows them to represent something that isn’t in the ‘official’ DBA sanctioned model. In doing so they are solving one problem, and creating another. We now have a model which sometimes has stuff in it that many developers have never heard of, and might not even understand. And being a document database, this may only occur in a few hundred records.
This means we’ve a choice:
- Models which don’t fully represent reality, but are comprehensible.
- Models which try to cover everything, but are incomprehensible.
The vast majority of current models are the first kind.
Conclusion
If you work with data for a living I doubt if you can ‘unsee’ the concept of “Ludic Fallacy” once you have seen it. The implications are significant:
- There will always be a need for client-side interpretation of data.
- Internal “Tribal Knowledge” about how to interpret data matters, and will continue to do so.
- No Code, Low Code and LLMs will always struggle, because the model is imperfect.