Cognitive Storage: Teaching Computers What to Learn and What to Forget
by Michael Zimmerman, IBM and Evangelos Eleftheriou, IBM Fellow.
April 4, 2016
Originally published in https://www.ibm.com/blogs/think/2016/04/04/cognitive-storage-ibm/
Close your eyes and think back to your last vacation.
The memories you are recalling were captured because your brain automatically puts a high value on significant experiences, such as a beautiful sunset or an amazing dinner.
Simultaneously, your brain also automatically puts a low value or forgets irrelevant things like waiting at a traffic light or checking in for your flight. With cognitive storage, computers can do the same.
Computers can be taught to learn the difference between high value and low value data i.e. memories or information, and this differentiation can be used to determine what is stored, where it is stored and for how long.
With rising costs in energy and the explosion in Big Data, particularly from the Internet of Things, this is a critical challenge as it could lead to huge savings in storage capacity, which in means less media costs and less energy consumption.
How does cognitive storage work?
In a new paper appearing today in the IEEE journal Computer, IBM storage an data scientists Giovanni Cherubini, Jens Jelitto and Vinodh Venkatesan unveil the concept of cognitive storage and while it isn’t available yet, it could be very soon.
The idea is based on a metric they call data value, which is analogous to determining the value of a piece of art — the higher the demand and the rarer the piece typically means it will have a higher value, requiring tight security.
For example, if 1,000 employees are accessing the same files every day, the value of that data set should be very high, just like a priceless Van Gogh. A cognitive storage system would learn this and store those files on fast media like flash. In addition, the system would automatically backed up these files multiple times. Lastly, the files may want to have extra security so they cannot be accessed without authorization.
Of course there is also the opposite. A data set, which is rarely accessed, like PDF files of 20 year-old tax documents, should be stored on cold media like tape and only available upon request. A cognitive storage system would also know that tax records need to be kept for at least 7 years and that they can be deleted after that period.
In many situations, data value can also change over time and a cognitive storage system can also adapt.
One way to determine its value is to track the access patterns of the data or the frequency it is used. Individuals can also add metadata tags to the data to help train the system, depending on the context in which the data is used. For example, an astronomer may tag a data set coming from the Andromeda galaxy as highly important or less important.
As detailed in the paper, IBM scientists have tested cognitive storage using 1.77 million files across seven users. Using a simple ranking of class 1, 2 and 3 based on metadata including user ID, group ID, file size, file permissions, date and time of creation, file extension, and directories in the path. They then split the server data into data per user, as each user could define different classes of files they deem important.
The result, a data value prediction accuracy of nearly 100% for the smaller class set.