Analyzing digital content and the behavior of viewing customers in real-time is essential to a lot of companies. Communication operators and content providers are offering many services that enable customers to consume video content using different fixed and mobile technologies through different devices, either in their homes or mobile. For these operators and providers, it is of vital importance to analyze behavior of the consumers in real-time and over longer periods of time, to be able to not only maximize revenue and minimize costs, but also to reach the highest possible level of customer experience.
The era of big data drastically changed the requirements for extracting meaning from business data. Despite the challenges, thanks to big data, and data visualization tools, organizations can analyze data and get answers nearly instantly on extremely large datasets, a feat not possible with traditional data warehousing and business intelligence tools. Content providers can now analyze data and get intelligent insights for immediate decisions. Analysis of consumer data and behavior enables service provider creation of relevant recommendations for content consumers. Communication Service Providers (CSP) are very active in this area. Typical activities include:
- Providing IPTV services bundled with fixed voice and broadband Internet
- Providing Mobile TV services over mobile broadband network
- Providing Satellite TV services
- Acquisition of smaller cable TV providers with existing Digital cable TV networks
- Selling subscriptions for OTT services with their broadband Internet and IPTV packages
- Packaging of services together, i.e. if customer is using IPTV, he can use free Mobile TV on the same device
This situation can generate quite complex business scenarios, where one customer can consume content using different services, different delivery channels and different devices. For instance, at the same time members of one household can use different content on different types of devices. Each action of the consumer who navigates his set-top box with remote control or navigates an application on a mobile device is logged in CSP’s applications. Using advanced analytics on this detailed data CSP’s are able to provide the best possible service, and they can understand consumers’ behavior such as which content they consume, through which channel at what time and on what device. They can analyze the performance of offered service packages and package options, segment customers based on their behavior and approach them with appropriate offers and recommendations. Rating information can also be used to negotiate content price with content providers.
Recommender systems are one of the typical uses of machine learning, since they apply the prediction algorithms on the collected data. Doing this during low loading periods allows the system both to consume data faster and to optimizeresource usage. Results of the algorithms give valuable information to CSP’s and guide users to the resources that are better suited to them . Every new stream of data enhances and optimizes the results for every entity for which the recommender systems are calculated.
The data flow in the system is shown on the diagram in Figure 1.
The data integration layer provides real time streaming data pipelines between various source systems and the data storage layer. Its basis is Apache Kafka, a distributed streaming platform with real-time capabilities for stream storing and processing. Due to its capability of holding stream data, Apache Kafka also provides a data staging area, and a single source of data for loading the data storage layer.
Data consumed by Kafka varies by the source. For example, the data sent by the IPTV set-top box is an unstructured log which can be parsed into a structure during the streams or stored in the data storage layer in order to be transformed prior to its usage. Also, JSON data, popular in many modern real-time systems of digital broadcasting OTT providers, is used in a similar manner.
The data storage layer is the analytics platform, Vertica, a massively parallel processing application chosen both for its performance and native integration with Apache Kafka and Apache Spark. Vertica gives the system the capability of performance scaling and data warehouse-like SQL querying capabilities. When a few source systems generate large volumes of data constantly, Vertica has advantages over other Big Data solutions. As Vertica is column oriented, retrieving data through analytical queries is significantly faster, and the data compression can be performed more efficiently compared to row-oriented database systems typically used for data warehousing.
The data analytics layer should be viewed as two separate parts due to usage. One is the user interface developed as a custom web application and Tableau visualization software that allows real-time analysis of recommendation data stored in Vertica. The other is a key component of the recommendation engine, based on Vertica machine learning capabilities extended with Apache Spark. This creates the recommendation set for each user based on its data through the recommendation generating algorithms. Finally, recommendation results are exposed, they can be accessed through an API and integrated back to the content delivery system.
Thank you to Vertica partner Poslovna inteligencija d.o.o. for contributing this article! Read more about this content analytics solution on their website.
Customer Case Study: Catch Media Boosts Customer Engagement 50% with Vertica
Data Disruptors Webinar: 4 Steps to Drive Customer Lifetime Value and Gain Real-Time Behavior Intelligence with Vertica and Catch Media
About the Author