Three Properties of Topological Analysis
Coordinate invariance
This property says that topology does not study properties of shapes which depend on the set of coordinates chosen. For example, the ellipses below are all considered to be topologically the same, even though they may be positioned differently in the plane.
The value of this property in data analysis is that we often modify data by the use of various transformations on the entries in a data matrix, which amount to a change of coordinates. Very simple transformations involve only translations and scaling, such as occurs in the transformation from Celsius temperatures to Fahrenheit or Kelvin. More complex transformations, such as three-dimensional rotations, are often useful to clarify underlying properties of a data set.
Deformation invariance
This property says that even when we stretch or deform geometric shape, we don’t change the topological properties. To give an example of this, consider the problem of distinguishing between letters. The human visual system is capable of robustly recognizing the differences which distinguish between a letter “A” and a letter “B” very robustly, independently of what font the letters are drawn in, the angle from which they are viewed, and even independent of possible curvature of the surface on which the letters are drawn.
The different choices of fonts or aspect can be thought of as deformations of the underlying shape of the letters. This kind of robustness is also quite useful when complicated transformations, such as log-log transformations, are applied to a data set. For instance, the shape below is the result of applying a log-log transformation to a perfectly round circle. We can think of this as deforming the circle, and topological methods say that the round circle and the transformed version are the same.
Compressed representations
The final key property of topological methods is that they produce compressed representations of shapes. For example, consider the circle, which consists of infinitely many points and infinitely many pairwise distances which characterize the shape. If we are willing to sacrifice a little bit of detail, such as the curvature of the arc, we can obtain a simple representation of the fundamental “loopy” property of the circle by using a hexagon.
This is extremely useful in understanding the features of large and complex data sets. In this case, the data set itself consists of perhaps millions of points, with a similarity relationship on it. The compressed representation encodes all these relationships in a very simple form, a topological network or complex, like the hexagon.
These properties are all of fundamental importance in understanding data sets, and they account for the power of the methods.
LINK to Video of Professor Gunnar Carlsson (Youtube): Introduction to Topological Data Analysis
For more on Topology and how Ayasdi is using it, visit their website at www.ayasdi.com