Getting the best results out of a machine learning (ML) model requires that you truly understand your data. However, ML datasets can contain hundreds of millions of data points, each consisting of hundreds (or even thousands) of features, making it nearly impossible to understand an entire dataset in an intuitive fashion. Visualization can help unlock nuances and insights in large datasets. A picture may be worth a thousand words, but an interactive visualization can be worth even more.
Working with the PAIR initiative, we’ve released Facets, an open source visualization tool to aid in understanding and analyzing ML datasets. Facets consists of two visualizations that allow users to see a holistic picture of their data at different granularities. Get a sense of the shape of each feature of the data using Facets Overview, or explore a set of individual observations using Facets Dive. These visualizations allow you to debug your data which, in machine learning, is as important as debugging your model. They can easily be used inside of Jupyter notebooks or embedded into webpages. In addition to the open source code, we’ve also created a Facets demo website. This website allows anyone to visualize their own datasets directly in the browser without the need for any software installation or setup, without the data ever leaving your computer.