In the January 2015 BIS BCBS239 Adoption progress report it was stated that “compliance with Principle 2 (data architecture/IT infrastructure) was rated lowest.” BCBS 239 is the first time that the Enterprise IT Architecture profession has been subject to regulatory scrutiny like its construction and transportation industry forbears.
This has become necessary because Distributed Enterprise IT Architectures have now hit the same engineering maturity inflection point with both many partial subsystem issues and in some cases public catastrophic supply chain failures.
Enterprise Architects must shift focus from solely functional analysis models to address the sustained volume, velocity and variety of data that they now aggregate and process to produce meaningful measures of Financial Sector business risks.
Let’s remind ourselves what the banks are being asked to do when it comes to data architecture/IT infrastructure:
“A bank should design, build and maintain data architecture and IT infrastructure which fully supports its risk data aggregation capabilities and risk reporting practices not only in normal times but also during times of stress or crisis, while still meeting the other Principles.”
Perhaps the reason for the lack of compliance is that
- There are no concrete definitions of what a Data Architecture for a large complex Enterprise should look like – let alone any internationally agreed standards. Many of the first generation Chief Data Officers now being appointed by banks have no track record of “Design,” “Build” or “Maintain” in any engineering discipline let alone IT.
- BIS itself hasn’t set any concrete expectations either as to what artefacts should be produced.
What Comprises a Risk Data Architecture?
I didn’t find much help from a quick visit to Wikipedia, which talks about how the risk data architecture should comprise of 3 “Pillars:”
So I drew on my three decades of CTO experience and as well as the thoughts of regulatory experts to offer some clarity by having you focus on the following questions:
- What are the core risk data entities that have to be measured, reported and mitigated?
- What are the component pieces of data that make up these risks?
- What are the linkages between the key entities and the component pieces of data – i.e. synthesis, aggregation, transformations?
- Similarly, what are thisolate the IT Applications that perform the composition of the risk entities and their subsequent analysis and distribution?
What You Really Have to Do
BIS offer a few clues as follows:-
“A bank’s risk data aggregation capabilities and risk reporting practices should be:
- Fully documented and subject to high standards of validation.
- Considered as part of any new initiatives, including acquisitions and/or divestitures, new product development, as well as broader process and IT change initiatives.
Banks do not necessarily need to have one data model; rather, there should be robust automated reconciliation procedures where multiple models are in use.”
Given the scale of most Financial Institution’s IT Estates that comprise hundreds of applications deployed over thousands of physical servers this can only be addressed by automation i.e.
- Consistent Discovery and Monitoring of Data Flows – both messaging and block file transfers at both Routing and ETL endpoints
- Automated Validation Processes to ensure naming and format standards are being conformed to – this can be done on a sampling basis as per typical factory quality control processes
At the moment Banks only have very crude Application Portfolio systems — which use an in-house defined, arbitrary classification of the business functions applications perform with occasionally some manual audit/regulatory firedrill processes to provide some form of information connectivity catalogue.
Ironically this current lack of data analysis rigor leads to banks being repeatedly charged for unauthorised use of fee-liable data during audit cycles which often runs to many millions of £/$.
When Are the Core Risk Data Entities Assembled/Distributed?
Timeliness, Repeatability and Frequency of Risk Calculations is also a key factor in the BIS Principles – let’s now apply the same macro Data Architecture pillars to this section of their requirements.
- Conceptual: When are the key business and information system events in the trading day: Are there multiple “operational” daily cycles running across the business entities? Are these catalogued or have they become part of an “Organisational Folklore?”
- Logical: The logical “When” model is fundamentally a classic, Gantt chart critical path analysis problem of resources and dependencies – although IT Application development tends to favour a range of Agile/Sprint Iterative delivery models, once things are in production then the basic “laws of physics” must be applied to ensure correct and timely delivery of the core risk data entities. During the 2008 crises many Banks’s daily risk batches ran very late because the job steps had been added piecemeal and no review processes/tooling existed to derive the critical path and highlight bottlenecks.
- Physical: Both Time Synchronization and Job Scheduling Capabilities need to be standardised across all the computing and ancillary equipment of the IT estate to ensure that consistent production of the key risk data entities can be maintained and that the necessary capacity/performance headroom required in times of market stress is either available or can be enabled on demand.
What You Really Have to Do
Most major Financial Institutions have good implementations of time synchronisation infrastructure across their estates – for BCBS 239 compliance this does not need to be at the same degree of precision required by Low Latency/Algorithmic platforms.
Conversely the same institutions have largely failed to maintain a single- or consistently-federated scheduling toolsets across their Application portfolio – this is due to a combination of under investment and weak technical leadership coupled with disinterest from the Enterprise Architecture functions who have failed to document the core dimension of time in their taxonomies.
The well-publicised failure of RBS core mainframe batch processing platform and its knock on effects across the UK banking system for several weeks should have been a wakeup call to the industry to invest in comprehension and strategic investment/optimisation of this key enabling asset.
Why Is the Risk Aggregation and Production Process Implemented in a Particular Way?
Sir Christopher Wren’s tomb in St Paul’s Cathedral carries the inscription “SI MONUMENTUM REQUIRIS CIRCUMSPICE” – i.e. If you seek his monument look around you – i.e. All architectures should have a purpose that should be self-evident. BIS hints at this too with the statement “Risk data should be reconciled with bank’s sources, including accounting data where appropriate, to ensure that the risk data is accurate.”
Again we can apply the 3 Pillars to clarify these requirements as follows
- Conceptual: The “why” or rationale for how the Risk Aggregation process is conducted for a particular institution lies at the heart of its Core Business Operating Model and Business Risk Appetite. If this cannot be documented and is not reviewed in line with quarterly financial statements significant regulatory scrutiny can be expected.
- Logical: Clear automatically derived Process metrics lie at the heart of ensuring that the Risk Aggregation Process is being operated in line with an institutions Core Business Operating Model and that it is helping to alert and protect the institution in times of stress.
- Physical: KPIs must be immutable and irrefutable i.e. directly derived from the underlying data flows and operations to have any value.
What You Really Have to Do
Currently the reporting of KPI’s is often massaged by a set of human aggregation processes into monthly “Status Decks” cultural change needs to occur to ensure that the real world dashboard data is automatically embedded into the reports to avoid ambiguity / political emphasis.
Who Is Responsible for the Governance of the Risk Data Entity Production and Delivery Processes?
As with the other pieces of this jigsaw BIS are giving few tangible clues i.e. “The owners (business and IT functions), in partnership with risk managers, should ensure there are adequate controls throughout the lifecycle of the data and for all aspects of the technology infrastructure”
Let’s apply our 3 tiers approach again to try and decode this sentence and determine the “Whos.”
- Conceptual: The key controls in this process are a combination of Job Roles + Operating Committees + Exception/Escalation Chains – these need to be maintained in a sustainable consistent archive and regularly reviewed. Deputization/Succession planning for attendees also needs to be addressed.
- Logical: The committee agenda/minutes need to be bound to both the metrics and any Exception/Escalation activities that have occurred during each reporting period.
- Physical: Where possible direct binding to KPI data and Exception/Escalation workflow data should be at the heart of the Operating Committees minutes/actions rather than manual aggregation/interpretation of the data.
What You Really Have to Do
You have to systematically blend the document-based approach of operating committees and scorecards with the live operational dashboards of process/technical monitoring technologies – which admittedly is very difficult to do currently. This gives rise to the commonly used “interpreted” RAG scorecard that is often manually assembled and “massaged” to manage “bad news” and/or accentuate positive results. With the advent of document based databases and semantics this area is ripe for automation and simplification.
Where Are the Core Risk Data Entities Created/Stored/Distributed?
The notions of Geography and “City Planning” are much more comfortable spaces for architects to describe and operate in – and indeed some of the concepts are very mature in large corporations so applying the 3 pillar approach would appear to be straightforward.
- Conceptual: The notions of Geography and “City Planning” are much more comfortable spaces for architects to describe and operate in – and indeed some of the concepts are very mature in large corporations so applying the 3 pillar approach would appear to be straightforward.There are multiple geographic concepts that need to be captured to answer this question i.e.
- “Real World” Geopolitical entities and the judicial/regulatory constraints they impose on the content of the data
- Organizational Geography (i.e. Business units/Divisions/Legal Entities) these typically act as segments and groupings of the core risk data entities and in many cases will be aligned to “Real World” constraints/boundaries
- Application Portfolio geography – As noted in earlier sections there need to be a clear linkage between risk data entities and the applications involved in the production and distribution processes
- Logical: “Real World” and Organizational Geographies are generally well understood concepts – the notion of “What is an Application” however needs to be clearly defined and agreed both in IT and its client business units. It is notable that ITIL failed to define a standard for what an application comprises and often confuses it with the notion of “Service” which can become a very overloaded term.
- Physical: Geopolitical Entities and Organization Hierarchies/Accounts are typically core components of an Enterprise Reference Data Platform – they are largely well understood concepts with concrete taxonomies, curated data and standardised coordinate systems + semantic relationships.Application portfolios are typically owned by the Enterprise Architecture/CTO function in Financial Institutions and in many cases are little more than a collection of manually maintained spreadsheets supported by Powerpoint or Visio function/swimlane diagrams aka “McKinsey Charts.”Enterprise IT Architects are often cynically viewed by their departmental peers as “Ivory Tower Thinkers” because of their inability to make application definitions concrete sustainable data entities, a key weakness that permeates most EA departments is that they almost always focus on refining the functional classifications of what applications do and often have to play catch-up as corporations reorganise or refocus their strategies.
What You Really Have to Do
The application portfolio of a Financial Institution should be a first class entity within its Enterprise Reference Data platform – the CTO/Enterprise Architecture function have responsibility for its maintenance which must be largely automated with validation/comformance checking processes against the physical infrastructure.
NOTE: An application will often span multiple logical production/dev/test environments as well as now being able to be instantiated dynamically on premise or external to an institution so the data model and maintenance functions must be able handle these short lived instances.
How Is the Risk Aggregation Process Built and Operated?
Finally we get to the most detailed set of architectural artefacts specified by BIS: “A bank should establish integrateddata taxonomies and architecture across the banking group, which includes information on the characteristics of the data (metadata), as well as use of single identifiers and/or unified naming conventions for data including legal entities, counterparties, customers and accounts”.
It is interesting to note that BIS focuses on the notion of standardized identifiers and naming conventions which are quite basic hygiene concepts, in reality there are some much more important components of the architecture that need to be defined first.
- Conceptual: The key functional elements of the aggregation system are the corollary of the “What” section earlier in this document – i.e. the operations performed on the data and the event “stimuli” along the critical path of the production process discussed in the “When” section.The other key entity that needs to be described to answer the “How” question is the notion of process state – i.e. How to introspect what is occurring at any point in time rather than at process or time boundaries?
- Logical: The logical model of state needs to be able to answer the questions:- Do we know “What” is in flight, is it on time i.e. “When” and is it sufficiently complete/correct “Why” – and if not “Who” is working to remediate things.
- Physical: The “When” Section discussed the need for temporal consistency and a coherent task scheduling capability – in this section all of connections, data flows, storage and control points and physical mechanisms that deliver them need to be described.As with any complex command and control system some level of redundancy/duplication should be provided but the range and fidelity of integration and storage techniques needs to be actively managed.
What You Really Have to Do
As noted above, in many large Financial Institutions Integration, Storage, Reporting, Delivery, Command+Control systems have become both fragmented and growing, so to achieve effective Risk Data Aggregation and Reporting compliance a single, integrated toolset needs to be applied along the supply chain.
So What Have We learned?
Being compliant with the principles stated by BIS in BCBS239 requires real IT work with tangible “Design”+“Build” assets and sustainable “Maintenance” processes linking multiple transaction and reference data systems with real artefacts owned by both Line and Enterprise Architecture Teams.
The Enterprise Application Portfolio and supporting functional taxonomy must become concrete reference data entities.
Linkage of dataflows to each of the Application/Environment instances must be achieved through significant mechanisation and automated physical discovery toolsets – manual firedrill collections of user opinions is not an option.
NB Excel, PowerPoint and Word artefacts are only ancillary to the core solution.
And finally… The data produced by this exercise should be used for the strategic optimisation of the organisation – not just for appeasing a set of faceless regulatory bureaucrats. “You can only manage what you measure” is a very old business maxim.