“Taming the Data Lake with Scalable Metrics Model”
BY RAMKUMAR RAVICHANDRAN, Director, Insights at Visa, Inc.
Big data is not more a fad that geeks, major enterprises, start-ups alike are in love with – it is a reality driven by the dynamic and diverse nature of channels, business lines, innovative products and customer behavior. All the 4Vs – Volume, Velocity, Variety and Veracity of data are true and us analysts, data scientists, data professionals, strategists, business leaders have to live with. Investments are being made into Technology, Infastructure and Talents but like a wise man once said “all problems in the world cannot be solved by throwing money at it”.
It is not as simple as creating a data lake where everything can be dumped and Data Scientists and Analysts can feed off of that. The adoption should not be just an Investment question (cost of data storage, data preparation, management and retrieval) on which predominantly the decisions are made. It is also a Returns question (Reports, Business Analytics, Advanced Analytics, Data Products, Decision Engines, etc.) which is usually ignored when making the decision. Investment only decisions usually create a sub-optimal experience for the end users, i.e., it may be efficient for Reporting but may be very slow and inefficient when an Analyst has to use it or vice versa. Adoption and Engagement needs a Strategic framework of key corporate needs, an Tactical Outcome Focused delivery approach and an iterative learning execution model.
Scalable Metrics Model:
Actual Slides that will be used in the talk.
RDBMS structure is still one of the most “go-to” framework for Enterprise Data Warehouse and has been so for decades. The reliability, stability, speed, ease of understanding makes it optimal for many core services. The downside is the flexibility, extensibility, cost of modifications and rigidity of the structure which is what Hadoop File System framework tries to address. But lack of structure brings its own problems of performance, reliability, error corrections, etc. and just forcing a structure via Metadata or Aggregates might not be sufficient for a wide variety of users. We need a hybrid framework which brings in the strengths of RDBMS with merits of HDFS whose key objective is to serve the diverse needs of users and is malleable enough to efficiently and effectively change with the needs. It has to be modular enough to predominantly address a bucket of needs (e.g, Reports/Decision Engines by functions) but also with connections that can help connect the dots (e.g., Deep Dive into drivers). The Scalable metrics model is one such option and Bharathiraja Chandrasekharan & I are discussing it at the Global Big Data Conference at Santa Clara on Sep 2nd.
Participation in this summit is purely on personal basis and not representing VISA in any form or matter. The talk is based on learnings from work across industries and firms. Care has been taken to ensure no proprietary or work related info of any firm is used in any material.
We would love to hear your feedback and thought on the concept, the approach and the presentation.