Big Data Meets Fine Art
by Thanigai Vellore, Enterprise Architect, Art.com
“Great things are done by a series of small things brought together”
– This famous quote by Vincent Van Gogh reflects what’s happening at art.com, where we are at the tipping point of using big data technologies in the Art space. The advent of big data technologies has opened up possibilities to solve many problems, specific to the art domain. While “Big Data” (or data in general) might not be useful by itself, newer capabilities and insights can be gained by applying domain-specific technologies/algorithms on top of big data. Below are some of the areas where we apply big data technologies to solve art related problems at scale.
What’s in an Art?
“Art is not a handicraft, it is the transmission of feeling the artist has experienced” – Leo Tolstoy.
Understanding the visual characteristics of an artwork can unearth powerful insights.
At art.com, we have the largest collection of hand selected art images in the world. We use distributed computing and computer vision technologies to process and store the visual attributes of each image. We power a visual search technology that enables searching based on a range of colors (or palettes), or similarity. Storing the “Visual DNA” of a image (such as color quantization, brightness, saturation, temperature, and many more features) at the pixel level allows us to answer questions like “What colors did Picasso use during the blue period?” or find a matching piece of art that harmonizes with a décor based on a color palette. We also use image classification algorithms (like visual bag-of-words, SURF, etc) to classify new images into categories. We use this feature to auto-categorize new incoming images and user-curated collections to specific categories. When these capabilities are combined with stream processing engines (like Spark Streaming), we can enable these aggregations in real-time.
I like the art when I see it!
Whoopi Goldberg once said, “Art and life are subjective. Not everybody’s gonna dig what I dig, but I reserve the right to dig It.”
and that sums up everything. Art is very personal and is a strong medium of self-expression.
We use the clickstream data from our websites to create models using Machine-Learning algorithms that help understand our user tastes and preferences. We use clustering algorithms to create “segment groups” for our users based on the art that they interacted with. We also use classification models based on decision trees to identify the type of user on the website. In addition, we are also building models (using well proven algorithms like Collaborative Filtering) to power our personalization and recommendation engine.
Architectural patterns such as “Lambda architecture” help us to implement these aforementioned ML models that work both on offline data as well as incrementally enriched real-time data. For instance, we use Apache Spark’s MLlib package, which is ideal for developing “iterative machine learning models” through configurable pipelines.
We are at the maturity and delivery phase of the big data life cycle that makes it practically possible to solve domain specific problems that were once restricted to within R&D labs.