Scaling to Olympic Proportions

BIG DATA TIPS, DYNAMIC PUBLISHING
The Olympics are a dream and a destination – a culmination of years’ worth of planning, hard work, and dedication that plays on a world stage. As London hosted the 2012 Summer Olympic Games, the British Broadcasting Corporation was the host of coverage – and it wanted its online experience to be as dramatic as Danny Boyle’s opening and closing ceremonies. “Our aspiration was that just as the Coronation did for TV in 1953, the Olympics would do for digital in 2012,” explained Phil Fearnley, General Manager, News & Knowledge, BBC Future Media.

The mission was lofty. Coverage involved over 10,000 athletes, a sophisticated audience that relied on social media, the need for continuous live coverage, and a demand for multichannel delivery. The steadfast process of Static Publishing, which had been relied on for over 15 years, had to be retired in favor of a Dynamic Publishing infrastructure.

Dynamic Publishing is not a new concept, but the term has evolved to mean much more than just dynamically serving content from a database onto a page. Dynamic Publishing 2.0 means creating a collection of related data elements and dynamically serving it as audiences demand. According to Matt Turner, Media CTO for MarkLogic, “In the case of the Olympic coverage, the flow of content was enormous, non-stop, real-time, and included:”

  • Data streamed from the International Olympic Committee
  • Videos from venues all around the UK
  • Feeds from the Press Association and other news organizations
  • Stories from BBC journalists
  • Endless social media updates from around the world

One example of dynamic updates was the athlete page on über-swimmer Michael Phelps (one of 10,000 athletes). It included Phelps’ performance updates, where he was swimming next, a journalist’s story about Phelps, and dynamically-generated information that updated in real time, including a box entitled “Phelps Against the World” (his medal count vs others).

It would have been impossible for the team of journalists and editors to maintain this and the other athlete pages – and it was just as impossible for the relational database that had been the workhorse since the 1990s to keep up as well. Dynamic Publishing required a new, flexible architecture that would allow the various data feeds to:

  • Make products more relevant with content and data from multiple sources delivered in disparate formats and in real time
  • Make all content available with asset search and discovery
  • Assemble custom content and collections
  • Deliver content in multiple formats through multiple channels

So what changed? In a November 2012 webcast, BBC Lead Architect Jem Rayfield explained that there were two key components critical to the transformation: a “triple store” and an enterprise NoSQL content store. The combination allowed an unparalleled level of automation and dynamic delivery.

The triple store uses linked-data technology to automate aggregation, publishing, and repurposing of interrelated content objects – all driven by an ontological, domain-modeled information architecture. It’s an organizational system aggregating the knowledge that Michael Phelps (member of the 2012 Olympic team, the U.S. swim team, the men’s swim team, the 4×200-Meter Freestyle Relay) competed in events and heats and won a variety of different “awards.” With each medal won, a tote board that depicted Phelps total medal count against all countries automatically updated – in real time.

The OWLIM triple store alone, however, could not process the massive amount of changing data. To handle the volume and ensure ability to scale, the BBC added an XML content store to handle all content assets: including statistics, tweets, videos, images, and articles. Video metadata included transciptions to time-codes so specific segments of video could be served. MarkLogic’s enterprise NoSQL server is a document-centric database that uses XML as its data model. It allows the easy loading of data – regardless of schema. Unlike a relational model, MarkLogic’s native XML repository allows horizontal and vertical scaling without sacrificing redundancy and failover. The triple store made XQuery calls to MarkLogic to handle the real-time updates.

Knowing that viewers were often engaged with a “second screen,” a mobile device or tablet from which they could actively engage with social media, the BBC served display-adjusting content to four channels: interactive television, computer, mobile, and tablet. By encouraging second-screen behavior, social media became an integral part of the coverage. Users could customize the feed by choosing content elements that would appear on the BBC’s iPlayer. One of the features of the BBC application allowed viewers to restart a live feed.

The pas de deux of OWLIM Triple Store and MarkLogic Server served up an astounding:

  • 106 million requests for BBC Olympic video content
  • 55 million global browsers across the games
  • 2.8 Petabytes of Data on the busiest day
  • A daily record of 7.1 million UK browsers

Breaking records for viewership, downloads, and overall user experience, the BBC has set the gold standard for digital coverage. And, with the Olympics behind it, the BBC is looking to revamp its news operations as well. Other broadcast entities are keeping close tabs: The host city for the 2016 Olympics is Brazil, and according to Rayfield, engineers from Globo -the leading Brazilian television network- have reached out to BBC’s technical team to get some pointers. With Brazil’s population expected to exceed 200 million by 2016 (nearly 3 times the UK population), records set by the BBC could likely be surpassed.

You may also like...