Big Data Strategy 2015 – from lab to market
By Christopher Blatchford, Director, Platform Technology, Thomson Reuters
What I’m really looking forward to in 2015 is the emergence of a ‘winning’ big data vendor architecture and associated set of standards. There are still a plethora of start-ups and corporates alike who have gambled on certain approaches, which are not necessarily fit for purpose – I’ve seen businesses try to create and deploy triple-store graphs on non-optimized relational databases, start-ups trip-up with Hadoop deployments given the small file problem (with subsequent deployment of HBase or similar as a band-aid) or companies simply group their existing data sources together into a single ‘lake’ or ‘pool’ and rebrand it as a ‘big data’ solution. Of course, it could be argued that similar problems are still experienced in the RDBMS world, but the difference being, there is significantly more talent and skill availability in these technologies than the former – this makes big data deployments risky, and potentially expensive, despite the open source nature of much of the technology.
Having said that, there are several vendors out there which are making all the right noises; in the Hadoop space, Cloudera, MapR and Hortonworks are making waves, and Platfora and Tableau in the analytics & visualisation space seem to have a strong foot-hold (I’m a big fan of Platfora). Then there is Sir Tim Berners-Lee semantic web vision – RDF, Sparql and OWL are the emerging standards (amongst others) in this space, and assuming mass adoption, could realise his ambitions for a truly open, linked data web.
When you connect all of these elements up, you can start to conceptualise a top-level big data architecture – for example, an underlying Hadoop data lake serving as the source of all data, providing a true chain of custody for data provenance purposes, with something like Stardog serving as the RDF triple-store graph (to satisfy the semantic web component), and then analytical tools sitting at the business end of the stack, such as Platfora, providing data scientists and product owners with the respective tools they need to discover new trends, or investigate new opportunities. Consultancy and contractors are now becoming proficient in these various technologies, allowing them to offer a coherent big data strategy deployment to business.
On the other hand, there are very large, established vendors which are offering ‘complete’ solutions in this space – HP have recently announced their Haven Big Data Platform, which packages up various tools such as their Vertica columnar database and IDOL On Demand platform (I recently organized a Hackathon between Thomson Reuters and HP using IDOL, more information here), and we’re all aware of the Amazon Web Services story.
So this begs the question, where will businesses invest their cash in 2015? Gartner recently reported that 73% of organizations have invested or are planning to invest in big data in the next two years, stating that organisations “are starting to get off the fence about their big data investment”. This is largely focused around the ‘volume’ component of the three V’s of big data, but even so, signals a step forward in solution design and deployment.
I think this evolution is probably being driven by a few key factors –
- Firstly, the benefits of big data are now starting to perceptibly reveal themselves, as implementations mature and transition out of R&D lab environments; whether it’s in the form of cost saving (supply chain for example) or new revenue generation (from new products & services or data insights), the commercial benefits are becoming more obvious.
- Secondly, technology leaders are beginning to think about (and in some cases execute upon) legacy system transition and shutdown – this continues to be a big cost barrier for many businesses implementing new, innovative solutions.
- Thirdly, technology leaders are beginning to transition out of strategic planning, experimentation and proof of concepts, and executing upon actual use cases that the business has developed – being able to demonstrate commercial value to the executive board is a vital hurdle in a big data proposal.
- Fourthly, we are seeing an explosion in skills and talent around the big data space; for example, the ‘data scientist’ role – an individual as comfortable with statistics, mathematics and programming as they are commercially savvy and able to communicate findings to both business and technologists.
- Fifthly, historically, it has been viewed to be much ‘safer’ for a corporation to place its bets on large, established technology vendors as opposed to a small start-up with limited resource and capability. This may well still be the case in more conservative industry, such as law, however even within the legal space we are seeing bold changes in technology strategy direction
Finally (and some would argue perhaps most importantly) it takes strong leadership and vision to implement big (often disruptive) changes, and those individuals and businesses that bet on and contribute to the right technology will likely surface as the industry leaders, leading the way for the second movers (which is not a bad place to be depending on your business…).
All this points to another hugely exciting year ahead, as vendors jostle for position, the semantic web further establishes itself and big data implementations suffer the slings and arrows of technology evangelists.