{"id":4023,"date":"2015-11-16T12:19:11","date_gmt":"2015-11-16T12:19:11","guid":{"rendered":"http:\/\/www.odbms.org\/blog\/?p=4023"},"modified":"2015-11-16T12:29:47","modified_gmt":"2015-11-16T12:29:47","slug":"on-dark-data-interview-with-gideon-goldin","status":"publish","type":"post","link":"https:\/\/www.odbms.org\/blog\/2015\/11\/on-dark-data-interview-with-gideon-goldin\/","title":{"rendered":"On Dark Data. Interview with Gideon Goldin"},"content":{"rendered":"<blockquote><p><strong>&#8220;Top\u00addown cataloging and master\u00addata management tools typically require expensive data curators, and are not simple to use. This poses a significant threat to cataloging efforts since so much knowledge about your organization\u2019s data is inevitably clustered across the minds of the people who need to question it and the applications they use to answer those questions.&#8221;&#8211;Gideon Goldin<\/strong><\/p><\/blockquote>\n<p>I have interviewed <strong>Gideon Goldin<\/strong>, UX Architect, Product Manager at <a onclick=\"javascript:pageTracker._trackPageview('\/outgoing\/www.tamr.com');\"  href=\"http:\/\/www.tamr.com\" target=\"_blank\">Tamr<\/a>.<\/p>\n<p>RVZ<\/p>\n<p><strong>Q1. What is \u201cdark data\u201d?<\/strong><\/p>\n<p><strong>Gideon Goldin:<\/strong> <a onclick=\"javascript:pageTracker._trackPageview('\/outgoing\/www.gartner.com\/it-glossary\/dark-data');\"  href=\"http:\/\/www.gartner.com\/it-glossary\/dark-data\" target=\"_blank\"> Gartner refers to dark data<\/a> as \u201c<em>the information assets organizations collect, process and store during regular business activities, but generally fail to use for other purposes (for example, analytics, business relationships and direct monetizing).<\/em>\u201d For most organizations, dark data comprises the majority of available data, and it is often the result of the constantly changing and unpredictable nature of enterprise data \u00adsomething that is likely to be exacerbated by corporate restructuring, M&amp;A activity, and a number of external factors.<\/p>\n<p>By shedding light on this data, organizations are better suited to make more data\u00addriven, accurate business decisions.<br \/>\nTamr Catalog, which is available as a <a onclick=\"javascript:pageTracker._trackPageview('\/outgoing\/www.tamr.com\/tamr-catalog-2\/');\"  href=\"http:\/\/www.tamr.com\/tamr-catalog-2\/\" target=\"_blank\">free downloadable app<\/a>, aims to do this, providing users with a view of their entire data landscape so they can quickly understand what was in the dark and why.<\/p>\n<p><strong>Q2. What are the main drawbacks of traditional top\u00addown methods of cataloging or \u201cmaster data management\u201d?<\/strong><\/p>\n<p><strong>Gideon Goldin:<\/strong> The main drawbacks are scalability and simplicity. When Yahoo, for example, started to catalog the web they employed some top\u00addown approaches, hiring specialists to curate structured directories of information. As the web grew, however, their solution became less relevant and significantly more costly. Google, on the other hand, mined the web to understand references that exist between pages, allowing the relevance of sites to emerge from the bottom\u00adup. As a result, Google\u2019s search engine was more accurate, easier to scale, and simpler.<\/p>\n<p>Top\u00addown cataloging and <a onclick=\"javascript:pageTracker._trackPageview('\/outgoing\/en.wikipedia.org\/wiki\/Master_data_management');\"  href=\"https:\/\/en.wikipedia.org\/wiki\/Master_data_management\" target=\"_blank\">master\u00addata management<\/a> tools typically require expensive data curators, and are not simple to use. This poses a significant threat to cataloging efforts since so much knowledge about your organization\u2019s data is inevitably clustered across the minds of the people who need to question it and the applications they use to answer those questions. Tamr Catalog aims to deliver an innovative and vastly simplified method for cataloging your organization\u2019s data.<\/p>\n<p><strong>Q3. Tamr recently opened a public Beta program \u00adTamr Catalog \u00ad for an enterprise metadata catalog. What is it?<\/strong><\/p>\n<p><strong>Gideon Goldin:<\/strong> The <a onclick=\"javascript:pageTracker._trackPageview('\/outgoing\/www.tamr.com\/tamr-opens-public-beta-program-for-its-free-enterprise-metadata-catalog-tool\/');\"  href=\"http:\/\/www.tamr.com\/tamr-opens-public-beta-program-for-its-free-enterprise-metadata-catalog-tool\/\" target=\"_blank\">Tamr Catalog Beta Program<\/a> is an open invitation to test\u00addrive our free cataloging software. We have yet to find an organization that is content with their current cataloging approaches, and we found that the biggest barrier to reform is often knowing where to start. Catalog can help: the goal of the Catalog Beta Program is to better understand how people want and need to collaborate around their data sources. We believe that an early partnership with the community will ensure that we develop useful functionality and thoughtful design.<\/p>\n<p><strong>Q4 What are the core functionality of Tamr Catalog?<\/strong><\/p>\n<p><strong>Gideon Goldin:<\/strong> Tamr Catalog enables users to easily register, discover and organize their data assets.<\/p>\n<p><strong>Q5. How does it help simplify access to high\u00adquality data sets for analytics?<\/strong><\/p>\n<p><strong>Gideon Goldin:<\/strong> Not surprisingly, people are biased to use the data sets closest to them. With Catalog, scientists and analysts can easily discover unfamiliar data sets\u00ad\u00addata sets, for example, that may belong to other departments or analysts. Catalog profiles and collects pointers to your sources, providing multifaceted and visual browsing of all data trivializing the search for any given set of data.<\/p>\n<p><strong>Q6. How does Tamr Catalog relate to the Tamr Data Unification Platform?<\/strong><\/p>\n<p><strong>Gideon Goldin:<\/strong> Before organizations can unify their data, preparing it for improved analysis or management, they need to know what they have. Organizations often lack a good approach for this first (and repeating) step in data unification. We realized this quickly when helping large organizations begin their unification projects, and we even realized we lacked a satisfactory tool to understand our own data. Thus, we built Catalog as a part of the <a onclick=\"javascript:pageTracker._trackPageview('\/outgoing\/www.tamr.com\/tamr-connect\/');\"  href=\"http:\/\/www.tamr.com\/tamr-connect\/\" target=\"_blank\">Tamr Data Unification Platform<\/a> to illuminate your data landscape, such that people can be confident that their unification efforts are as comprehensive as possible.<\/p>\n<p><strong>Q7. What are the main challenges (technical and non technical) in achieving a broad adoption of a vendor\u00ad and platform \u00adneutral metadata cataloging?<\/strong><\/p>\n<p><strong>Gideon Goldin:<\/strong> Often the challenge isn\u2019t about volume, it&#8217;s about variety. While a vendor\u00ad neutral Catalog intends to solve exactly this, there remains a technical challenge in providing a flexible and elegant interface for cataloging dozens or hundreds of different types of data sets and the structures they comprise. <\/p>\n<p>However, we find that some of the biggest (and most interesting) challenges revolve around organizational processes and culture. Some organizations have developed sophisticated but unsustainable approaches to managing their data, while others have become paralyzed by the inherently disorganized nature of their data. It can be difficult to appreciate the value of investing in these problems. Figuring out where to start, however, shouldn\u2019t be difficult. This is why we chose to release a lightweight application free of charge.<\/p>\n<p><strong>Q8. Chief Data Officers (CDOs), data architects and business analysts have different requirements and different modes of collaborating on (shared) data sets. How do you address this in your catalog?<\/strong><\/p>\n<p><strong>Gideon Goldin:<\/strong> The goal of cataloging isn\u2019t cataloging, it\u2019s helping CDOs identify business opportunities, empowering architects to improve infrastructures, enabling analysts to enrich their studies, and more. Catalog allows anyone to register and organize sources, encouraging open communication along the way.<\/p>\n<p><strong>Q9. How do you handle issues such as data protection, ownership, provenance and licensing in the Tamr catalog?<\/strong><\/p>\n<p><strong>Gideon Goldin:<\/strong> Catalog allows users to indicate who owns what. Over the course of our Beta program, we have been fortunate enough to have over 800 early users of Catalog and have collected feedback about how our users would like to see data protection and provenance implemented in their own environments. We are eager to release new functionality to address these needs in the near future.<\/p>\n<p><strong>Q10. Do you plan to use the Tamr Catalog also for collecting data sets that can be used for data projects for the Common Good?<\/strong><\/p>\n<p><strong>Gideon Goldin:<\/strong> We do\u00ad\u00ad know of a few instances of Catalog being used for such purposes, including projects that will build on the documenting of city and\u200b \u200bhealth data. In addition to our Catalog Beta Program, we are introducing a <a onclick=\"javascript:pageTracker._trackPageview('\/outgoing\/www.tamr.com\/catalog-developer-community\/');\"  href=\"http:\/\/www.tamr.com\/catalog-developer-community\/\" target=\"_blank\">Community Developer Program<\/a>, where we are eager to see how the community links Tamr Catalogs to new sources (including those in other catalogs), new analytics and visualizations, and ultimately insights. We believe in the power of <a onclick=\"javascript:pageTracker._trackPageview('\/outgoing\/de.wikipedia.org\/wiki\/Open_Data');\"  href=\"https:\/\/de.wikipedia.org\/wiki\/Open_Data\" target=\"_blank\">open data<\/a> at Tamr, and we\u2019re excited to learn how we can help the Common Good.<\/p>\n<p>&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8211;<br \/>\n<strong>Gideon Goldin<\/strong>, <em>UX Architect, Product Manager at Tamr.<\/em><\/p>\n<p><em>Prior to Tamr, Gideon Goldin worked as a data visualization\/UX consultant and university lecturer. He holds a Masters in HCI and a PhD in cognitive science from Brown University, and is interested in designing novel human\u00admachine experiences. You can reach Gideon on Twitter at @gideongoldin or email him at Gideon.Goldin at tamr.com.<\/em><\/p>\n<p><strong>Resources<\/strong><\/p>\n<p>&#8211; \u00a0<a onclick=\"javascript:pageTracker._trackPageview('\/outgoing\/www.tamr.com\/tamr-catalog-2\/');\"  href=\"http:\/\/www.tamr.com\/tamr-catalog-2\/\" target=\"_blank\">Download Free Tamr Catalog app.<\/a><\/p>\n<p>-\u200b<a onclick=\"javascript:pageTracker._trackPageview('\/outgoing\/www.tamr.com\/catalog-developer-community\/');\"  href=\"http:\/\/www.tamr.com\/catalog-developer-community\/\" target=\"_blank\">Tamr Catalog Developer Community<\/a><br \/>\n<em>\u00a0Online community where Tamr catalog users can comment, interact directly with the development team, and learn more about the software; and where\u200b developers can explore extending the tool by creating new data connectors.<\/em><\/p>\n<p>&#8211; <a onclick=\"javascript:pageTracker._trackPageview('\/outgoing\/www.gartner.com\/it-glossary\/dark-data');\"  href=\"http:\/\/www.gartner.com\/it-glossary\/dark-data\" target=\"_blank\">Gartner IT Glossary: Dark data<\/a><\/p>\n<p><strong>Related Posts<\/strong><\/p>\n<p>&#8211; <strong><a onclick=\"javascript:pageTracker._trackPageview('\/outgoing\/www.odbms.org\/blog\/2015\/06\/data-for-the-common-good-interview-with-andrea-powell\/');\"  href=\"http:\/\/www.odbms.org\/blog\/2015\/06\/data-for-the-common-good-interview-with-andrea-powell\/\" target=\"_blank\">Data for the Common Good. Interview with Andrea Powell. ODBMS Industry Watch, June 9, 2015<\/a><\/strong><\/p>\n<p>&#8211; <strong><a onclick=\"javascript:pageTracker._trackPageview('\/outgoing\/www.odbms.org\/2015\/09\/doubt-and-verify-data-science-power-tools\/');\"  href=\"http:\/\/www.odbms.org\/2015\/09\/doubt-and-verify-data-science-power-tools\/\" target=\"_blank\">Doubt and Verify: Data Science Power Tools By Michael L. Brodie, CSAIL, MIT <\/a><\/strong><\/p>\n<p>&#8211; <strong><a onclick=\"javascript:pageTracker._trackPageview('\/outgoing\/www.odbms.org\/2015\/04\/data-wisdom-for-data-science\/');\"  href=\"http:\/\/www.odbms.org\/2015\/04\/data-wisdom-for-data-science\/\" target=\"_blank\">Data Wisdom for Data Science Bin Yu, Departments of Statistics and EECS, University of California at Berkeley<\/a><\/strong><\/p>\n<p><strong>Follow ODBMs.org on Twitter: <a onclick=\"javascript:pageTracker._trackPageview('\/outgoing\/twitter.com\/odbmsorg');\"  href=\"https:\/\/twitter.com\/odbmsorg\" target=\"_blank\">@odbmsorg<\/a><\/strong><\/p>\n<!-- AddThis Advanced Settings generic via filter on the_content --><!-- AddThis Share Buttons generic via filter on the_content -->","protected":false},"excerpt":{"rendered":"<p>&#8220;Top\u00addown cataloging and master\u00addata management tools typically require expensive data curators, and are not simple to use. This poses a significant threat to cataloging efforts since so much knowledge about your organization\u2019s data is inevitably clustered across the minds of the people who need to question it and the applications they use to answer those [&hellip;]<!-- AddThis Advanced Settings generic via filter on get_the_excerpt --><!-- AddThis Share Buttons generic via filter on get_the_excerpt --><\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":[],"categories":[1],"tags":[35,66,869,866,870,210,867,224,798,871,747,868,634],"_links":{"self":[{"href":"https:\/\/www.odbms.org\/blog\/wp-json\/wp\/v2\/posts\/4023"}],"collection":[{"href":"https:\/\/www.odbms.org\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.odbms.org\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.odbms.org\/blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.odbms.org\/blog\/wp-json\/wp\/v2\/comments?post=4023"}],"version-history":[{"count":12,"href":"https:\/\/www.odbms.org\/blog\/wp-json\/wp\/v2\/posts\/4023\/revisions"}],"predecessor-version":[{"id":4036,"href":"https:\/\/www.odbms.org\/blog\/wp-json\/wp\/v2\/posts\/4023\/revisions\/4036"}],"wp:attachment":[{"href":"https:\/\/www.odbms.org\/blog\/wp-json\/wp\/v2\/media?parent=4023"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.odbms.org\/blog\/wp-json\/wp\/v2\/categories?post=4023"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.odbms.org\/blog\/wp-json\/wp\/v2\/tags?post=4023"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}