ODBMS Industry Watch » Data http://www.odbms.org/blog Trends and Information on Big Data, New Data Management Technologies, Data Science and Innovation. Fri, 09 Feb 2018 21:04:31 +0000 en-US hourly 1 http://wordpress.org/?v=4.2.19 On Artificial Intelligence and Analytics. Interview with Narendra Mulani http://www.odbms.org/blog/2017/12/on-artificial-intelligence-and-analytics-interview-with-narendra-mulani/ http://www.odbms.org/blog/2017/12/on-artificial-intelligence-and-analytics-interview-with-narendra-mulani/#comments Fri, 08 Dec 2017 08:50:46 +0000 http://www.odbms.org/blog/?p=4523

“You can’t get good insights from bad data, and AI is playing an instrumental role in the data preparation renaissance.”–Narendra Mulani

I have interviewed Narendra Mulani, chief analytics officer, Accenture Analytics.

RVZ

Q1. What is the role of Artificial Intelligence in analytics?

Narendra Mulani: Artificial Intelligence will be the single greatest change driver of our age. Combined with analytics, it’s redefining what’s possible by unlocking new value from data, changing the way we interact with each other and technology, and improving the way we make decisions. It’s giving us wider control and extending our capabilities as businesses and as people.

AI is also the connector and culmination of many elements of our analytics strategy including data, analytics techniques, platforms and differentiated industry skills.

You can’t get good insights from bad data, and AI is playing an instrumental role in the data preparation renaissance.
AI-powered analytics essentially frees talent to focus on insights rather than data preparation which is more daunting with the sheer volume of data available. It helps organizations tap into new unstructured, contextual data sources like social, video and chat, giving clients a more complete view of their customer. Very recently we acquired Search Technologies who possess a unique set of technologies that give ‘context to content’ – whatever its format – and make it quickly accessible to our clients.
As a result, we gain more precise insights on the “why” behind transactions for our clients and can deliver better customer experiences that drive better business outcomes.

Overall, AI-powered analytics will go a long way in allowing the enterprise to find the trapped value that exists in data, discover new opportunities and operate with new agility.

Q2. How can enterprises become ‘data native’ and digital at the core to help them grow and succeed?

Narendra Mulani: It starts with embracing a new culture which we call ‘data native’. You can’t be digital to the core if you don’t embed data at the core. Getting there is no mean feat. The rate of change in technology and data science is exponential, while the rate at which humans can adapt to this change is finite. In order to close the gap, businesses need to democratize data and get new intelligence to the point where it is easily understood and adopted across the organization.
With the help of design-led analytics and app-based delivery, analytics becomes a universal language in the organization, helping employees make data-driven decisions, collaborate across teams and collectively focus efforts on driving improved outomes for the business.

Enterprises today are only using a small fraction of the data available to them as we have moved from the era of big data to the era of all data. The comprehensive, real-time view businesses can gain of their operations from connected devices is staggering.

But businesses have to get a few things right to ensure they go on this journey.

Understanding and embracing convergence of analytics and artificial intelligence is one of them. You can hardly overstate the impact AI will have on mobilizing and augmenting the value in data, in 2018 and beyond. AI will be the single greatest change driver and will have a lasting effect on how business is conducted.

Enterprises also need to be ready to seize new opportunities – and that means using new data science to help shape hypotheses, test and optimize proofs-of-concept and scale quickly. This will help you reimagine your core business and uncover additional revenue streams and expansion opportunities.

All this requires a new level of agility. To help our clients act and respond fast, we support them with our platforms, our people and our partners. Backed by deep analytics expertise, new cloud-based systems and a curated and powerful alliance and delivery network, our priority is architecting the best solution to meet the needs of each client. We offer an as-a-service engagement model and a suite of intelligent industry solutions that enable even greater agility and speed to market.

Q3. Why is machine learning (ML) such a big deal, where is it driving changes today, and what are the big opportunities for it that have not yet been tapped?

Narendra Mulani: Machine learning allows computers to discover hidden or complex patterns in data without explicit programming. The impact this has on the business is tremendous—it accelerates and augments insights discovery, eliminates tedious repetitive tasks, and essentially enables better outcomes. It can be used to do a lot of good for people, from reading a car’s license plate and forcing the driver to slow down, to allowing people to communicate with others regardless of the language they speak, and helping doctors find very early evidence of cancer.

While the potential we’re seeing for ML and AI in general is vast, businesses are still in the infancy of tapping it. Organizations looking to put AI and ML to use today need to be pragmatic. While it can amplify the quality of insights in many areas, it also increases complexity for organizations, in terms of procuring specialized infrastructure or in identifying and preparing the data to train and use AI, and with validating the results. Identifying the real potential and the challenges involved are areas where most companies today lack the necessary experience and skills and need a trusted advisor or partner.

Whenever we look at the potential AI and ML have, we should also be looking at the responsibility that comes with it. Explainable AI and AI transparency are top of mind for many computer scientists, mathematicians and legal scholars.
These are critical subjects for an ethical application of AI – particularly critical in areas such as financial services, healthcare and life sciences – to ensure that data use is appropriate, and to assess the fairness of derived algorithms.
We need recognize that, while AI is science, and science is limitless, there are always risks in how that science is used by humans, and proactively identify and address issues this might cause for people and society.

————————————————

Narendra1

Narendra Mulani is Chief Analytics Officer of Accenture Analytics, a practice that his passion and foresight have helped shape since 2012.

A connector at the core, Narendra brings machine learning, data science, data engineers and the business closer together across industries and geographies to embed analytics and create new intelligence, democratize data and foster a data native culture.

He leads a global team of industry and function-specific analytics professionals, data scientists, data engineers, analytics strategy, design and visualization experts across 56 markets to help clients unlock trapped value and define new ways to disrupt in their markets. As a leader, he believes in creating an environment that is inspiring, exciting and innovative.

Narendra takes a thoughtful approach to developing unique analytics strategies and uncovering impactful outcomes. His insight has been shared with business and trade media including Bloomberg, Harvard Business Review, Information Management, CIO magazine, and CIO Insight. Under Narendra’s leadership, Accenture’s commitment and strong momentum in delivering innovative analytics services to clients was recognized in Everest Group’s Analytics Business Process Services PEAK Matrix™ Assessment in 2016.

Narendra joined Accenture in 1997. Prior to assuming his role as Chief Analytics Officer, he was the Managing Director – Products North America, responsible for delivering innovative solutions to clients across industries including consumer goods and services, pharmaceuticals, and automotive. He was also managing director of supply chain for Accenture Management Consulting where he led a global practice responsible for defining and implementing supply chain capabilities at a diverse set of Fortune 500 clients.

Narendra graduated with a Bachelor of Commerce degree at Bombay University, where he was introduced to statistics and discovered he understood probability at a fundamental level that propelled him on his destined career path. He went on to receive an MBA in Finance in 1982 as well as a PhD in 1985 focused on Multivariate Statistics, both from the University of Massachusetts. Education remains fundamentally important to him.

As one who logs too many frequent flier miles, Narendra is an active proponent of taking time for oneself to recharge and stay at the top of your game. He practices what he preaches through early rising and active mindfulness and meditation to keep his focus and balance at work and at home. Narendra is involved with various activities that support education and the arts, and is a music enthusiast. He lives in Connecticut with his wife Nita and two children, Ravi and Nikhil.

Resources

Accenture Invests in and Forms Strategic Alliance with Leading Quantum Computing Firm 1QBit

-Accenture Forms Alliance with Paxata to Help Clients Build an Intelligent Enterprise by Putting Business Users in Control of Data

Apple & Accenture Partner to Create iOS Business Solutions

Accenture Completes Cloud-Based IT Transformation for Towergate, Helping Insurance Broker Improve Its Operations and Reduce Annual IT Costs by 30 Percent

Accenture Acquires Search Technologies to Expand Its Content Analytics and Enterprise Search Capabilities

Related Posts

How Algorithms can untangle Human Questions. Interview with Brian Christian. ODBMS Industry Watch, March 31, 2017

Big Data and The Great A.I. Awakening. Interview with Steve Lohr. ODBMS Industry Watch, December 19, 2016

Machines of Loving Grace. Interview with John Markoff. ODBMS Indutry Watch, August 11, 2016

On Artificial Intelligence and Society. Interview with Oren Etzioni. ODBMS Industry Watch, January 15, 2016

Follow us on Twitter: @odbmsorg

##

]]>
http://www.odbms.org/blog/2017/12/on-artificial-intelligence-and-analytics-interview-with-narendra-mulani/feed/ 0
Internet of Things: Safety, Security and Privacy. Interview with Vint G. Cerf http://www.odbms.org/blog/2017/06/internet-of-things-safety-security-and-privacy-interview-with-vint-g-cerf/ http://www.odbms.org/blog/2017/06/internet-of-things-safety-security-and-privacy-interview-with-vint-g-cerf/#comments Sun, 11 Jun 2017 17:06:03 +0000 http://www.odbms.org/blog/?p=4373

” I like the idea behind programmable, communicating devices and I believe there is great potential for useful applications. At the same time, I am extremely concerned about the safety, security and privacy of such devices.” –Vint G. Cerf

I had the pleasure to interview Vinton G. Cerf. Widely known as one of the “Fathers of the Internet,” Cerf is the co-designer of the TCP/IP protocols and the architecture of the Internet. Main topic of the interview is the Internet of Things (IoT) and its challenges, especially the safety, security and privacy of IoT devices.
Vint is currently Chief Internet Evangelist for Google.
RVZ

Q1. Do you like the Internet of Things (IoT)?

Vint Cerf: This question is far too general to answer. I like the idea behind programmable, communicating devices and I believe there is great potential for useful applications. At the same time, I am extremely concerned about the safety, security and privacy of such devices. Penetration and re-purposing of these devices can lead to denial of service attacks (botnets), invasion of privacy, harmful dysfunction, serious security breaches and many other hazards. Consequently the makers and users of such devices have a great deal to be concerned about.

Q2. Who is going to benefit most from the IoT?

Vint Cerf: The makers of the devices will benefit if they become broadly popular and perhaps even mandated to become part of local ecosystem. Think “smart cities” for example. The users of the devices may benefit from their functionality, from the information they provide that can be analyzed and used for decision-making purposes, for example. But see Q1 for concerns.

Q3. One of the most important requirement for collections of IoT devices is that they guarantee physical safety and personal security. What are the challenges from a safety and privacy perspective that the pervasive introduction of sensors and devices pose? (e.g. at home, in cars, hospitals, wearables and ingestible, etc.)

Vint Cerf: Access control and strong authentication of parties authorized to access device information or control planes will be a primary requirement. The devices must be configurable to resist unauthorized access and use. Putting physical limits on the behavior of programmable devices may be needed or at least advisable (e.g., cannot force the device to operate outside of physically limited parameters).

Q5. Consumers want privacy. With IoT physical objects in our everyday lives will increasingly detect and share observations about us. How is it possible to reconcile these two aspects?

Vint Cerf: This is going to be a tough challenge. Videocams that help manage traffic flow may also be used to monitor individuals or vehicles without their permission or knowledge, for example (cf: UK these days). In residential applications, one might want (insist on) the ability to disable the devices manually, for example. One would also want assurances that such disabling cannot be defeated remotely through the software.

Q6. Let`s talk about more about security. It is reported that badly configured “smart devices” might provide a backdoor for hackers. What is your take on this?

Vint Cerf: It depends on how the devices are connected to the rest of the world. A particularly bad scenario would have a hacker taking over the operating system of 100,000 refrigerators. The refrigerator programming could be preserved but the hacker could add any of a variety of other functionality including DDOS capacity, virus/worm/Trojan horse propagation and so on.
One might want the ability to monitor and log the sources and sinks of traffic to/from such devices to expose hacked devices under remote control, for example. This is all a very real concern.

Q7. What measures can be taken to ensure a more “secure” IoT?

Vint Cerf: Hardware to inhibit some kinds of hacking (e.g. through buffer overflows) can help. Digital signatures on bootstrap programs checked by hardware to inhibit boot-time attacks. Validation of software updates as to integrity and origin. Whitelisting of IP addresses and identifiers of end points that are allowed direct interaction with the device.

Q8. Is there a danger that IoT evolves into a possible enabling platform for cyber-criminals and/or for cyber war offenders?

Vint Cerf: There is no question this is already a problem. The DYN Corporation DDOS attack was launched by a botnet of webcams that were readily compromised because they had no access controls or well-known usernames and passwords. This is the reason that companies must feel great responsibility and be provided with strong incentives to limit the potential for abuse of their products.

Q9. What are your personal recommendations for a research agenda and policy agenda based on advances in the Internet of Things?

Vint Cerf: Better hardware reinforcement of access control and use of the IOT computational assets. Better quality software development environments to expose vulnerabilities before they are released into the wild. Better software update regimes that reduce barriers to and facilitate regular bug fixing.

Q10. The IoT is still very much a work in progress. How do you see the IoT evolving in the near future?

Vint Cerf: Chaotic “standardization” with many incompatible products on the market. Many abuses by hackers. Many stories of bugs being exploited or serious damaging consequences of malfunctions. Many cases of “one device, one app” that will become unwieldy over time. Dramatic and positive cases of medical monitoring that prevents serious medical harms or signals imminent dangers. Many experiments with smart cities and widespread sensor systems.
Many applications of machine learning and artificial intelligence associated with IOT devices and the data they generate. Slow progress on common standards.

—————
Google-HS-9-2008
Vinton G. Cerf co-designed the TCP/IP protocols and the architecture of the Internet and is Chief Internet Evangelist for Google. He is a member of the National Science Board and National Academy of Engineering and Foreign Member of the British Royal Society and Swedish Royal Academy of Engineering, and Fellow of ACM, IEEE, AAAS, and BCS.
Cerf received the US Presidential Medal of Freedom, US National Medal of Technology, Queen Elizabeth Prize for Engineering, Prince of Asturias Award, Japan Prize, ACM Turing Award, Legion d’Honneur and 29 honorary degrees.

Resources

European Commission, Internet of Things Privacy & Security Workshop’s Report,10/04/2017

Securing the Internet of Things. US Homeland Security, November 16, 2016

Related Posts

Social and Ethical Behavior in the Internet of Things By Francine Berman, Vinton G. Cerf. Communications of the ACM, Vol. 60 No. 2, Pages 6-7, February 2017

Security in the Internet of Things, McKinsey & Company,May 2017

Interview to Vinton G. Cerf. ODBMS Industry Watch, July 27, 2009

Five Challenges to IoT Analytics Success. By Dr. Srinath Perera. ODBMS.org, September 23, 2016

Follow us on Twitter: @odbsmorg

##

]]>
http://www.odbms.org/blog/2017/06/internet-of-things-safety-security-and-privacy-interview-with-vint-g-cerf/feed/ 0
Big Data and The Great A.I. Awakening. Interview with Steve Lohr http://www.odbms.org/blog/2016/12/big-data-and-the-great-a-i-awakening-interview-with-steve-lohr/ http://www.odbms.org/blog/2016/12/big-data-and-the-great-a-i-awakening-interview-with-steve-lohr/#comments Mon, 19 Dec 2016 08:35:56 +0000 http://www.odbms.org/blog/?p=4274

“I think we’re just beginning to grapple with implications of data as an economic asset” –Steve Lohr.

My last interview for this year is with Steve Lohr. Steve Lohr has covered technology, business, and economics for the New York Times for more than twenty years. In 2013 he was part of the team awarded the Pulitzer Prize for Explanatory Reporting. We discussed Big Data and how it influences the new Artificial Intelligence awakening.

Wishing you all the best for the Holiday Season and a healthy and prosperous New Year!

RVZ

Q1. Why do you think Google (TensorFlow) and Microsoft (Computational Network Toolkit) are open-sourcing their AI software?

Steve Lohr: Both Google and Microsoft are contributing their tools to expand and enlarge the AI community, which is good for the world and good for their businesses. But I also think the move is a recognition that algorithms are not where their long-term advantage lies. Data is.

Q2. What are the implications of that for both business and policy?

Steve Lohr: The companies with big data pools can have great economic power. Today, that shortlist would include Google, Microsoft, Facebook, Amazon, Apple and Baidu.
I think we’re just beginning to grapple with implications of data as an economic asset. For example, you’re seeing that now with Microsoft’s plan to buy LinkedIn, with its personal profiles and professional connections for more than 400 million people. In the evolving data economy, is that an antitrust issue of concern?

Q3. In this competing world of AI, what is more important, vast data pools, sophisticated algorithms or deep pockets?

Steve Lohr: The best answer to that question, I think, came from a recent conversation with Andrew Ng, a Stanford professor who worked at GoogleX, is co-founder of Coursera and is now chief scientist at Baidu. I asked him why Baidu, and he replied there were only a few places to go to be a leader in A.I. Superior software algorithms, he explained, may give you an advantage for months, but probably no more. Instead, Ng said, you look for companies with two things — lots of capital and lots of data. “No one can replicate your data,” he said. “It’s the defensible barrier, not algorithms.”

Q4. What is the interplay and implications of big data and artificial intelligence?

Steve Lohr: The data revolution has made the recent AI advances possible. We’ve seen big improvements in the last few years, for example, in AI tasks like speech recognition and image recognition, using neural network and deep learning techniques. Those technologies have been around for decades, but they are getting a huge boost from the abundance of training data because of all the web image and voice data that can be tapped now.

Q5. Is data science really only a here-and-now version of AI?

Steve Lohr: No, certainly not only. But I do find that phrase a useful way to explain to most of my readers — intelligent people, but not computer scientists — the interplay between data science and AI. To convey that rudiments of data-driven AI are already all around us. It’s not — surely not yet — robot armies and self-driving cars as fixtures of everyday life. But it is internet search, product recommendations, targeted advertising and elements of personalized medicine, to cite a few examples.

Q6. Technology is moving beyond increasing the odds of making a sale, to being used in higher-stakes decisions like medical diagnosis, loan approvals, hiring and crime prevention. What are the societal implications of this?

Steve Lohr: The new, higher-stakes decisions that data science and AI tools are increasingly being used to make — or assist in making — are fundamentally different than marketing and advertising. In marketing and advertising, a decision that is better on average is plenty good enough. You’ve increased sales and made more money. You don’t really have to know why.
But the other decisions you mentioned are practically and ethically very different. These are crucial decisions about individual people’s lives. Better on average isn’t good enough. For these kinds of decisions, issues of accuracy, fairness and discrimination come into play.
That, I think, argues for two things. First, some sort of auditing tool; the technology has to be able to explain itself, to explain how a data-driven algorithm came to the decision or recommendation that it did.
Second, I think it argues for having a “human in the loop” for most of these kinds of decisions for the foreseeable future.

Q7. Will data analytics move into the mainstream of the economy (far beyond the well known, born-on-the-internet success stories like Google, Facebook and Amazon)?

Steve Lohr: Yes, and I think we’re seeing that now in nearly every field — health care, agriculture, transportation, energy and others. That said, it is still very early. It is a phenomenon that will play out for years, and decades.
Recently, I talked to Jeffrey Immelt, the chief executive of General Electric, America’s largest industrial company. GE is investing heavily to put data-generating sensors on its jet engines, power turbines, medical equipment and other machines — and to hire software engineers and data scientists.
Immelt said if you go back more than a century to the origins of the company, dating back to Thomas Edison‘s days, GE’s technical foundation has been materials science and physics. Data analytics, he said, will be the third fundamental technology for GE in the future.
I think that’s a pretty telling sign of where things are headed.

—————————–
Steve Lohr has covered technology, business, and economics for the New York Times for more than twenty years and writes for the Times’ Bits blog. In 2013 he was part of the team awarded the Pulitzer Prize for Explanatory Reporting.
He was a foreign correspondent for a decade and served as an editor, and has written for national publications such as the New York Times Magazine, the Atlantic, and the Washington Monthly. He is the author of Go To: The Story of the Math Majors, Bridge Players, Engineers, Chess Wizards, Maverick Scientists, Iconoclasts—the Programmers Who Created the Software Revolution and Data-ism The Revolution Transforming Decision Making, Consumer Behavior, and Almost Everything Else.
He lives in New York City.

————————–

Resources

Google (TensorFlow): TensorFlow™ is an open source software library for numerical computation using data flow graphs.

Microsoft (Computational Network Toolkit): A free, easy-to-use, open-source, commercial-grade toolkit that trains deep learning algorithms to learn like the human brain.

Data-ism The Revolution Transforming Decision Making, Consumer Behavior, and Almost Everything Else. by Steve Lohr. 2016 HarperCollins Publishers

Related Posts

Don’t Fear the Robots. By STEVE LOHR. -OCT. 24, 2015-The New York Times, SundayReview | NEWS ANALYSIS

G.E., the 124-Year-Old Software Start-Up. By STEVE LOHR. -AUG. 27, 2016- The New York Times, TECHNOLOGY

Machines of Loving Grace. Interview with John Markoff. ODBMS Industry Watch, Published on 2016-08-11

Recruit Institute of Technology. Interview with Alon Halevy. ODBMS Industry Watch, Published on 2016-04-02

Civility in the Age of Artificial Intelligence, by STEVE LOHR, technology reporter for The New York Times, ODBMS.org

On Artificial Intelligence and Society. Interview with Oren Etzioni, ODBMS Industry Watch.

On Big Data and Society. Interview with Viktor Mayer-Schönberger, ODBMS Industry Watch.

Follow us on Twitter:@odbmsorg

##

]]>
http://www.odbms.org/blog/2016/12/big-data-and-the-great-a-i-awakening-interview-with-steve-lohr/feed/ 1
How the 11.5 million Panama Papers were analysed. Interview with Mar Cabra http://www.odbms.org/blog/2016/10/how-the-11-5-million-panama-papers-were-analysed-interview-with-mar-cabra/ http://www.odbms.org/blog/2016/10/how-the-11-5-million-panama-papers-were-analysed-interview-with-mar-cabra/#comments Tue, 11 Oct 2016 17:54:36 +0000 http://www.odbms.org/blog/?p=4214

“The best way to explore all The Panama Papers data was using graph database technology, because it’s all relationships, people connected to each other or people connected to companies.” –Mar Cabra.

I have interviewed Mar Cabra, head of the Data & Research Unit of the International Consortium of Investigative Journalists (ICIJ). Main subject of the interview is how the 11.5 million Panama Papers were analysed.

RVZ

Q1. What is the mission of the International Consortium of Investigative Journalists (ICIJ)?

Mar Cabra: Founded in 1997, the ICIJ is a global network of more than 190 independent journalists in more than 65 countries who collaborate on breaking big investigative stories of global social interest.

Q2. What is your role at ICIJ?

Mar Cabra: I am the Editor at the Data and Research Unit – the desk at the ICIJ that deals with data, analysis and processing, as well as supporting the technology we use for our projects.

Q3. The Panama Papers investigation was based on a 2.6 Terabyte trove of data obtained by Süddeutsche Zeitung and shared with ICIJ and a network of more than 100 media organisations. What was your role in this data investigation?

Mar Cabra: I co-ordinated the work of the team of developers and journalists that first got the leak from Süddeutsche Zeitung, then processed it to make it available online though secure platforms with more than 370 journalists.
I also supervised the data analysis that my team did to enhance and focus the stories. My team was also in charge of the interactive product that we produced for the publication stage of The Panama Papers, so we built an interactive visual application called the ‘Powerplayers’ where we detailed the main stories of the politicians with connections to the offshore world. We also released a game explaining how the offshore world works! Finally, in early May, we updated the offshore database with information about the Panama Papers companies, the 200,000-plus companies connected with Mossack Fonseca.

Q4. The leaked dataset are 11.5 million files from Panamanian law firm Mossack Fonseca. How was all this data analyzed?

Mar Cabra: We relied on Open Source technology and processes that we had worked on in previous projects to process the data. We used Apache Tika to process the documents and also to access them, and created a processing chain of 30 to 40 machines in Amazon Web Services which would process in parallel those documents, then index them onto a document search platform that could be used by 100s of journalists from anywhere in the world.

Q5. Why did you decide to use a graph-based approach for that?

Mar Cabra: Inside the 11.5 million files in the original dataset given to us, there were more than 3 million that came from Mossaka Fonseca’s internal database, which basically contained names of companies in offshore jurisdictions and the people behind them. In other words, that’s a graph! The best way to explore all The Panama Papers data was using graph database technology, because it’s all relationships, people connected to each other or people connected to companies.

Q6. What were the main technical challenges you encountered in analysing such a large dataset?

Mar Cabra: We had already used all the tools that we were using in this investigation, in previous projects. The main issue here was dealing with many more files in many more formats. So the main challenge was how can we make readable all those files, which in many cases were images, in a fast way.
Our next problem was how could we make them understandable to journalists that are not tech savvy. Again, that’s where a graph database became very handy, because you don’t need to be a data scientist to work with a graph representation of a dataset, you just see dots on a screen, nodes, and then just click on them and find the connections – like that, very easily, and without having to hand-code or build queries. I should say you can build queries if you want using Cypher, but you don’t have to.

Q7. What are the similarities with the way you analysed data in the Swiss Leaks story (exposing the fraudulent activity of 100,000 HSBC private bank clients in Switzerland)?

Mar Cabra: We used the same tools for that – a document search platform and a graph database and we used them in combination to find stories. The baseline was the same but the complexity was 100 times more for the Panama Papers. So the technology is the same in principle, but because we were dealing with many more documents, much more complex data, in many more formats, we had to make a lot of improvements in the tools so they really worked for this project. For example, we had to improve the document search platform with a batch search feature, where journalists would upload a list of names and then they would get a list back of links when that list of names had a hit a document.

Q8. Emil Eifrem, CEO, Neo Technology wrote: “If the Panama Papers leak had happened ten years ago, no story would have been written because no one else would have had the technology and skillset to make sense of such a massive dataset at this scale.” What is your take on this?

Mar Cabra: We would have done the Panama Papers papers differently, probably printing the documents – and that would have had a tremendous effect on the paper supplies of the world, because printing out all 11.5 million files would have been crazy! We would have published some stories and the public might have seen some names on the front page of a few newspapers, but the scale and the depth and the understanding of this complex world would not have been able to happen without access to the technology we have today. We would just have not been able to do such an in-depth investigation at a global scale without the technology we have access to now.

Q9. Whistleblowers take incredible risks to help you tell data stories. Why do they do it?

Mar Cabra: Occasionally, some whistleblowers have a grudge and are motivated in more personal terms. Many have been what we call in Spanish ‘widows of power’: people who have been in power and have lost it, and those who wish to expose the competition or have a grudge. Motivations of Whistleblowers vary, but I think there is always an intention to expose injustice. ‘John Doe’ is the source behind the Panama Papers, and a few weeks after we published, he explained his motivation; he wanted to expose an unjust system.

————————–
Mar Cabra is the head of ICIJ’s Data & Research Unit, which produces the organization’s key data work and also develops tools for better collaborative investigative journalism. She has been an ICIJ staff member since 2011, and is also a member of the network.

Mar fell in love with data while being a Fulbright scholar and fellow at the Stabile Center for Investigative Journalism at Columbia University in 2009/2010. Since then, she’s promoted data journalism in her native Spain, co-creating the first ever masters degree on investigative reporting, data journalism and visualisation  and the national data journalism conference, which gathers more than 500 people every year.

She previously worked in television (BBC, CCN+ and laSexta Noticias) and her work has been featured in the International Herald Tribune, The Huffington Post, PBS, El País, El Mundo or El Confidencial, among others.
In 2012 she received the Spanish Larra Award to the country’s most promising journalist under 30. (PGP public key)

Resources

– Panama Papers Source Offers Documents To Governments, Hints At More To Come. International Consortium of Investigative Journalists. May 6, 2016

The Panama Papers. ICIJ

– The two journalists from Sueddeutsche ZeitungFrederik Obermaier and Bastian Obermayer

– Offshore Leaks Database: Released in June 2013, the Offshore Leaks Database is a simple search box.

Open Source used for analysing the #PanamaPapers:

– Oxwall: We found an open source social network tool called Oxwall that we tweaked to our advantage. We basically created a private social network for our reporters.

– Apache Tika and Tesseract to do optical character recognition (OCR),

– We created a small program ourselves which we called Extract which is actually in our GitHub account that allowed us to do this parallel processing. Extract would get a file and try to see if it could recognize the content. If it couldn’t recognize the content, then we would do OCR and then send it to our document searching platform, which was Apache Solr.

– Based on Apache Solr, we created an index, and then we used Project Blacklight, another open source tool that was originally used for libraries, as our front-end tool. For example, Columbia University Library, where I studied, used this tool.

– Linkurious: Linkurious is software that allows you to visualize graphs very easily. You get a license, you put it in your server, and if you have a database in Neo4j you just plug it in and within hours you have the system set up. It also has this private system where our reporters can login or logout.

– Thanks to another open source tool – in this case Talend – and extractions from a load tool, we were able to easily transform our database into Neo4j, plug in Linkurious and get reporters to search.

Neo4j: Neo4j is a highly scalable, native graph database purpose-built to leverage not only data but also its relationships. Neo4j’s native graph storage and processing engine deliver constant, real-time performance, helping enterprises build intelligent applications to meet today’s evolving data challenges.

-The good thing about Linkurious is that the reporters or the developers at the other end of the spectrum can also make highly technical Cypher queries if they want to start looking more in depth at the data.

Related Posts

##

]]>
http://www.odbms.org/blog/2016/10/how-the-11-5-million-panama-papers-were-analysed-interview-with-mar-cabra/feed/ 0
On Silos, Data Integration and Data Security. Interview with David Gorbet http://www.odbms.org/blog/2016/09/on-silos-data-integration-and-data-security-interview-with-david-gorbet/ http://www.odbms.org/blog/2016/09/on-silos-data-integration-and-data-security-interview-with-david-gorbet/#comments Fri, 23 Sep 2016 20:02:51 +0000 http://www.odbms.org/blog/?p=4229

“Data integration isn’t just about moving data from one place to another. It’s about building an actionable, operational view on data that comes from multiple sources so you can integrate the combined data into your operations rather than just looking at it later as you would in a typical warehouse project.” — David Gorbet.

I have interviewed David Gorbet, Senior Vice President,Engineering at MarkLogic. We cover several topics in the interview: Silos, Data integration, data quality, security and the new features of MarkLogic 9.

RVZ

Q1. Data integration is the number one challenge for many organisations. Why?

David Gorbet: There are three ways to look at that question. First, why do organizations have so many data silos? Second, what’s the motivation to integrate these silos, and third, why is this so hard?

Our Product EVP, Joe Pasqua, did an excellent presentation on the first question at this year’s MarkLogic World. The spoiler is that silos are a natural and inevitable result of an organization’s success. As companies become more successful, they start to grow. As they grow, they need to partition in order to scale. To function, these partitions need to run somewhat autonomously, which inevitably creates silos.
Another way silos enter the picture is what I call “application accretion” or less charitably, “crusty application buildup.” Companies merge, and now they have two HR systems. Divisions acquire special-purpose applications and now they have data that exists only in those applications. IT projects are successful and now need to add capabilities, but it’s easier to bolt them on and move data back and forth than to design them into an existing IT system.

Two years ago I proposed a data-centric view of the world versus an application-centric view. If you think about it, most organizations have a relatively small number of “things” that they care deeply about, but a very large number of “activities” they do with these “things.”
For example, most organizations have customers, but customer-related activities happen all across the organization.
Sales is selling to them. Marketing is messaging to them. Support is helping solve their problems. Finance is billing them. And so on… All these activities are designed to be independent because they take place in organizational silos, and the data silos just reflect that. But the data is all about customers, and each of these activities would benefit greatly from information generated by and maintained in the other silos. Imagine if Marketing could know what customers use the product for to tailor the message, or if Sales knew that the customer was having an issue with the product and was engaged with Support? Sometimes dealing with large organizations feels like dealing with a crazy person with multiple personalities. Organizations that can integrate this data can give their customers a much better, saner experience.

And it’s not just customers. Maybe it’s trades for a financial institution, or chemical compounds for a pharmaceutical company, or adverse events for a life sciences company, or “entities of interest” for an intelligence or police organization. Getting a true, 360-degree view of these things can make a huge difference for these organizations.
In some cases, like with one customer I spoke about in my most recent MarkLogic World keynote who looks at the environment of potentially at-risk children, it can literally mean the difference between life and death.

So why is this so hard? Because most technologies require you to create data models that can accommodate everything you need to know about all of your data in advance, before you can even start the data integration project. They also require you to know the types of queries you’re going to do on that data so you can design efficient schemas and indexing schemes.
This is true even of some NoSQL technologies that require you to figure out sharding and compound indexing schemes in advance of loading your data. As I demonstrated in that keynote I mentioned, even if you have a relatively small set of entities that are quite simple, this is incredibly hard to do.
Usually it’s so hard that instead organizations decide to do a subset of the integration to solve a specific need or answer a specific question. Sadly, this tends to create yet another silo.

Q2. Integrate data from silos: how is it possible?

David Gorbet: Data integration isn’t just about moving data from one place to another. It’s about building an actionable, operational view on data that comes from multiple sources so you can integrate the combined data into your operations rather than just looking at it later as you would in a typical warehouse project.

How do you do that? You build an operational data hub that can consume data from multiple sources and expose APIs on that data so that downstream consumers, either applications or other systems, can consume it in real time. To do this you need an infrastructure that can accommodate the variability across silos naturally, without a lot of up-front data modeling, and without each silo having a ripple effect on all the others.
For the engineers out there (like me), think of this as trying to turn an O(n2) problem into an O(n) problem.
As the number of silos increases, most projects get exponentially more complex, since you can only have one schema and every new silo impacts that schema, which is shared by all data across all existing silos. You want a technology where adding a new data silo does not require re-doing all the work you’ve already done. In addition, you need a flexible technology that allows a flexible data model that can adapt to change. Change in both what data is used and in how it’s used. A system that can evolve with the evolving needs of the business.

MarkLogic can do this because it can ingest data with multiple different schemas and index and query it together.
You don’t have to create one schema that can accommodate all your data. Our built-in application services allows our customers to build APIs that expose the data directly from their data hub and with ACID transactions, these APIs can be used to build real operational applications.

Q3. What is the problem with traditional solutions like relational databases, Extract Transform and Load (ETL) tools?

David Gorbet: To use a metaphor, most technology used for this type of project is like concrete. Now concrete is incredibly versatile. You can make anything you want out of concrete: a bench, a statue, a building, a bridge… But once you’ve made it, you’d better like it because if you want to change it you have to get out the jackhammer.

Many projects that use these tools start out with lofty goals, and they spend a lot of time upfront modeling data and designing schemas. Very quickly they realize that they are not going to be able to make that magical data model that can accommodate everything and be efficiently queried. They start to cut corners to make their problem more tractable, or they design flexible but overly generic models like tall thin tables that are inefficient to query. Every corner they cut limits the types of applications they can then build on the resulting integrated data, and inevitably they end up needing some data they left behind, or needing to execute a query they hadn’t planned (and built an index) for.

Usually at some point they decide to change the model from a hub-and-spoke data integration model to a point-to-point model, because point-to-point integrations are much easier. That, or it evolves as new requirements emerge, and it becomes impossible to keep up by jackhammering the system and starting over. But this just pushes the complexity out of these now point-to-point flows and into the overall system architecture. It also causes huge governance problems, since data now flows in lots of directions and is transformed in many ways that are generally pretty opaque and hard to trace. The inability to capture and query metadata about these data flows causes master-data problems and governance problems, to the point where some organizations genuinely have no idea where potentially sensitive data is being used. The overall system complexity also makes it hard to scale and expensive to operate.

Q4. What are the typical challenges of handling both structured, and unstructured data?

David Gorbet: It’s hard enough to integrate structured data from multiple silos. Everything I’ve already talked about applies even if you have purely structured data. But when some of your data is unstructured, or has a complex, variable structure, it’s much harder. A lot of data has a mix of structured data and unstructured text. Medical records, journal articles, contracts, emails, tweets, specifications, product catalogs, etc. The traditional solution to textual data in a relational world is to put it in an opaque BLOB or CLOB, and then surface its content via a search technology that can crawl the data and build indexes on it. This approach suffers from several problems.

First, it involves stitching together multiple different technologies, each of which has its own operational and governance characteristics. They don’t scale the same way. They don’t have the same security model (unless they have no security model, which is actually pretty common). They don’t have the same availability characteristics or disaster recovery model.
They don’t backup consistently with each other. The indexes are separate, so they can’t be queried together, and keeping them in sync so that they’re consistent is difficult or impossible.

Second, more and more text is being mined for structure. There are technologies that can identify people, places, things, events, etc. in freeform text and structure it. Sentiment analysis is being done to add metadata to text. So it’s no longer accurate to think of text as islands of unstructured data inside a structured record. It’s more like text and structure are inter-mixed at all levels of granularity. The resulting structure is by its nature fluid, and therefore incompatible with the up-front modeling required by relational technology.

Third, search engines don’t index structure unless you tell them to, which essentially involves explaining the “schema” of the text to them so that they can build facets and provide structured search capabilities. So even in your “unstructured” technology, you’re often dealing with schema design.

Finally, as powerful as it is, search technology doesn’t know anything about the semantics of the data. Semantic search enables a much richer search and discovery experience. Look for example at the info box to the right of your Google results. This is provided by Google’s knowledge graph, a graph of data using Semantic Web technologies. If you want to provide this kind of experience, where the system can understand concepts and expand or narrow the context of the search accordingly, you need yet another technology to manage the knowledge graph.

Two years ago at my MarkLogic World keynote I said that search is the query language for unstructured data, so if you have a mix of structured and unstructured data, you need to be able to search and query together. MarkLogic lets you mix structured and unstructured search, as well as semantic search, all in one query, resolved in one technology.

Q5. An important aspect when analysing data is Data Quality. How do you evaluate if the data is of good or of bad quality?

David Gorbet: Data quality is tough, particularly when you’re bringing data together from multiple silos. Traditional technologies require you to transform the data from one schema into another in order to move it from place to place. Every transformation leaves some data behind, and every one has the potential to be a point of data loss or data corruption if the transformation isn’t perfect. In addition, the lineage of the data is often lost. Where did this attribute of this entity come from? When was it extracted? What was the transform that was run on it? What did it look like before?
All of this is lost in the ETL process. The best way to ensure data quality is to always bring along with each record the original, untransformed data, as well as metadata tracing its provenance, lineage and context.
MarkLogic lets you do this, because our flexible schema accommodates source data, canonicalized (transformed) data, and metadata all in the same record, and all of it is queryable together. So if you find a bug in your transform, it’s easy to query for all impacted records, and because you have the source data there, you can easily fix it as well.

In addition, our Bitemporal feature can trace changes to a record over time, and let you query your data as it is, as it was, or as you thought it was at any given point in time or over any historical (or in some cases future) time range. So you have traceability when your data changes, and you can understand how and why it has changed.

Q6. Data leakage is another problem for many corporations that experienced high profile security incidents. What can be done to solve this problem?

David Gorbet: Security is another important aspect of data governance. And security isn’t just about locking all your data in a vault and only letting some people look at it. Security is more granular than that. There are some data that can be seen by just about anyone in your organization. Some that should only be seen by people who need it, and some that should be hidden from all but people with specific roles. In some cases, even users with a particular role should not see data unless they have a provable need in addition to the role required. This is called “compartment security,” meaning you have to be in a certain compartment to see data, regardless of your role or clearance overall.

There is a principle in security called “defense in depth.” Basically it means pushing the security to the lowest layer possible in the stack. That’s why it’s critically important that your DBMS have strong and granular security features.
This is especially true if you’re integrating data from silos, each of which may have its own security rules.
You need your integrated data hub to be able to observe and enforce those rules, regardless of how complex they are.

Increasingly the concern is over the so-called “insider threat.” This is the employee, contractor, vendor, managed service provider, or cloud provider who has access to your infrastructure. Another good reason not to implement security in your application, because if you do, any DBA will be able to circumvent it. Today, with the move to cloud and other outsourced infrastructure, organizations are also concerned about what’s on the file system. Even if you secure your data at the DBMS layer, a system administrator with file system access can still get at it. To counter this, more organizations are requiring “at rest” encryption of data, which means that the data is encrypted on the file system. A good implementation will require a separate role to manage encryption keys, different from the DBA or SA roles, along with a separate key management technology. In our implementation, MarkLogic never even sees the database encryption keys, relying instead on a separate key management system (KMS) to unlock data for us. This separation of concerns is a lot more secure, because it would require insiders to collude across functions and organizations to steal data. You can even keep your data in the cloud and your keys on-premises, or with another managed service provider.

Q8. What is new in MarkLogic® 9 database? ?

David Gorbet: There’s so much in MarkLogic 9 it’s hard to cover all of it. That presentation I referenced earlier from Joe does a pretty good job of summarizing the features. Many of the features in MarkLogic 9 are designed to make data integration even easier. MarkLogic 9 has new ways of modeling data that can keep it in its flexible document form, but project it into tabular form for more traditional analysis (aggregates, group-bys, joins, etc.) using either SQL or a NoSQL API we call the Optic API. This allows you to define the structured parts of your data and let MarkLogic index it in a way that makes it most efficient to query and aggregate.
You can also use this technique to extract RDF triples from your data, giving you easy access to the full power of Semantics technologies.
We’re doing more to make it easier to get data into MarkLogic via a new data movement SDK that you can hook directly up to your data pipeline. This SDK can help orchestrate transformations and parallel loads of data no matter where it comes from.

We’re also doubling down on security. Earlier I mentioned encryption at rest. That’s a new feature for MarkLogic 9.
We’re also doing sub-record-level role- and compartment-based access control. This means that if you have a record (like a customer record) that you want to make broadly available, but there is some data in that record (like a SSN) that you want to restrict access to, you can easily do that. You can also obfuscate and transform data within a record to redact it for export or for use in a context that is less secure than MarkLogic.

Security is a governance feature, and we’re improving other governance features as well, with policy-based tiering for lifecycle management, and improvements to our Bitemporal feature that make it a full-fledged compliance feature.
We’re introducing new tools to help monitor and manage multiple clusters at a time. And we’re making many other improvements in many other areas, like our new geospatial region index that makes region-region queries much faster, improvements to tools like Query Console and MLCP, and many, many more.

One exciting feature that is a bit hard to understand at first is our new Entity Services feature. You can think of this as a catalog of entities. You can put whatever you want in this catalog. Entity attributes, relationships, etc. but also policies, governance rules, and other entity class metadata. This is a queryable semantic model, so you can query your catalog at runtime in your application. We’ll also be providing tools that use this catalog to help build the right set of indexes, indexing templates, APIs, etc. for your specific data. Over time, Entity Services will become the foundation of our vision of the “smart database.” You’ll hear us start talking a lot more about that soon.

—————–

David Gorbet, Senior Vice President, Engineering, MarkLogic.

David Gorbet has the best job in the world. As SVP of Engineering, David manages the team that delivers the MarkLogic product and supports our customers as they use it to power their amazing applications. Working with all those smart, talented engineers as they pour their passion into our product is a humbling experience, and seeing the creativity and vision of our customers and how they’re using our product to change their industry is simply awesome.

Prior to MarkLogic, David helped pioneer Microsoft’s business online services strategy by founding and leading the SharePoint Online team. In addition to SharePoint Online, David has held a number of positions at Microsoft and elsewhere with a number of enterprise server products and applications, and numerous incubation products.

David holds a Bachelor of Applied Science Degree in Systems Design Engineering with an additional major in Psychology from the University of Waterloo, and an MBA from the University of Washington Foster School of Business.

Resources

Join the Early Access program for a MarkLogic 9 introduction by visiting: ea.marklogic.com

-The MarkLogic Developer License is free to all who sign up and join the MarkLogic developer community.

Related Posts

– On Data Governance. Interview with David Saul. ODBMS Industry Watch,  2016-07-23

– On Data Interoperability. Interview with Julie Lockner. ODBMS Industry Watch, 2016-06-07

– On Data Analytics and the Enterprise. Interview with Narendra Mulani. ODBMS Industry Watch, 2016-05-24

Follow us on Twitter: @odbmsorg

##

]]>
http://www.odbms.org/blog/2016/09/on-silos-data-integration-and-data-security-interview-with-david-gorbet/feed/ 0
On Data Analytics and the Enterprise. Interview with Narendra Mulani. http://www.odbms.org/blog/2016/05/on-data-analytics-and-the-enterprise-interview-with-narendra-mulani/ http://www.odbms.org/blog/2016/05/on-data-analytics-and-the-enterprise-interview-with-narendra-mulani/#comments Tue, 24 May 2016 16:31:20 +0000 http://www.odbms.org/blog/?p=4144

“A hybrid technology infrastructure that combines existing analytics architecture with new big data technologies can help companies to achieve superior outcomes.”–Narendra Mulani

I have interviewed Narendra MulaniChief Analytics Officer, Accenture Analytics. Main topics of our interview are: Data Analytics, Big Data, the Internet of Things, and their repercussion for the enterprise.

RVZ

Q1. What is your role at Accenture?

Narendra Mulani: I’m the Chief Analytics Officer at Accenture Analytics and I am responsible for building and inspiring a culture of analytics and driving Accenture’s strategic agenda for growth across the business. I lead a team of analytics professionals around the globe that are dedicated to helping clients transform into insight-driven enterprises and focused on creating value through innovative solutions that combine industry and functional knowledge with analytics and technology.

With the constantly increasing amount of data and new technologies becoming available, it truly is an exciting time for Accenture and our clients alike. I’m thrilled to be collaborating with my team and clients and taking part, first-hand, in the power of analytics and the positive disruption it is creating for businesses around globe.

Q2. What are the main drivers you see in the market for Big Data Analytics?

Narendra Mulani: Companies across industries are fighting to secure or keep their lead in the marketplace.
To excel in this competitive environment, they are looking to exploit one of their growing assets: Data.
Organizations see big data as a catalyst for their transformation into digital enterprises and as a way to secure an insight-driven competitive advantage. In particular, big data technologies are enabling companies with greater agility as it helps them to analyze data comprehensively and take more informed actions at a swifter pace. We’ve already passed the transition point with big data – instead of discussing the possibilities with big data, many are already experiencing the actual insight-driven benefits from it, including increased revenues, a larger base of loyal customers, and more efficient operations. In fact, we see our clients looking for granular solutions that leverage big data, advanced analytics and the cloud to address industry specific problems.

Q3. Analytics and Mobility: how do they correlate?

Narendra Mulani: Analytics and mobility are two digital areas that work hand-in-hand on many levels.
As an example, mobile devices and the increasingly connected world through the Internet of Things (IoT) have become two key drivers for big data analytics. As mobile devices, sensors, and the IoT are constantly creating new data sources and data types, big data analytics is being applied to transform the increasing amount of data into important and actionable insight that can create new business opportunities and outcomes. Also, this view can be reversed, where analytics feeds insight into mobile devices such as tablets to workers in offices or out in the field to enable them to make real-time decisions that could benefit their business.

Q4. Data explosion: What does it create ? Risks, Value or both?

Narendra Mulani: The data explosion that’s happening today and will continue to happen due to the Internet of Things creates a lot of opportunity for businesses. While organizations recognize the value that the data can generate, the sheer amount of data – internal data, external data, big data, small data, etc – can be overwhelming and create an obstacle for analytics adoption, project completion, and innovation. To overcome this challenge and pursue actionable insights and outcomes, organizations shouldn’t look to analyze all of the data that’s available, but identify the right data needed to solve the current project or challenge at hand to create value.

It’s also important for companies to manage the potential risk associated with the influx of data and take the steps needed to optimize and protect it. They can do this by aligning IT and business leads to jointly develop and maintain data governance and security strategies. At a high level, the strategies would govern who uses the data and how the data is analyzed and leveraged, define the technologies that would manage and analyze the data, and ensure the data is secured with the necessary standards. Suitable governance and security strategies should be requirements for insight-driven businesses. Without them, organizations could experience adverse and counter-productive results.

Q5. You introduced the concept of the “Modern Data Supply Chain”? How does it differ from the traditional Supply Chain?

Narendra Mulani: As companies’ data ecosystems are usually very complex with many data silos, a modern data supply chain helps them to simplify their data environment and generate the most value from their data. In brief, when data is treated as a supply chain, it can flow swiftly, easily and usefully through the entire organization— and also through its ecosystem of partners, including customers and suppliers.

To establish an effective modern data supply chain, companies should create a hybrid technology environment that enables a data service platform with emerging big data technologies. As a result, businesses will be able to access, manage, move, mobilize and interact with broader and deeper data sets across the organization at a much quicker pace than previously possible and place action on the attained analytics insights that could help it to more effectively deliver to its consumers, develop new innovative solutions, and differentiate in its market.

Q6. You talked about “Retooling the Enterprise”. What do you mean by this?

Narendra Mulani: Some businesses today are no longer just using analytics, they are taking the next step by transforming into insight-driven enterprises. To achieve “insight-driven enterprise” status, organizations need to retool themselves for optimization. They can pursue an insight-driven transformation by:

· Establishing a center of gravity for analytics – a center of gravity for analytics often takes the shape of a Center of Excellence or a similar concentration of talent and resources.
· Employing agile governance – build horizontal governance structures that are focused on outcomes and speed to value, and take a “test and learn” approach to rolling out new capabilities. A secure governance foundation could also improve the democratization of data throughout a business.
· Creating an inter-disciplinary high performing analytics team — field teams with diverse skills, organize talent effectively, and create innovative programs to keep the best talent engaged.
· Deploying new capabilities faster – deploy new, modern and agile technologies, as well as hybrid architectures and specifically designed toolsets, to help revolutionize how data has been traditionally managed, curated and consumed, to achieve speed to capability and desired outcomes. When appropriate, cloud technologies should be integrated into the IT mix to benefit from cloud-based usage models.
· Raising the company’s analytics IQ – have a vision of what would be your “intelligent enterprise” and implement an Analytics Academy that provides analytics training for functional business resources in addition to the core management training programs.

Q7. What are the risks from the Internet of Things? And how is it possible to handle such risks?

Narendra Mulani: The IoT is prompting an even greater focus on data security and privacy. As a company’s machines, employees and ecosystems of partners, providers, and customers become connected through the IoT, securing the data that is flowing across the IoT grid can be increasingly complex. Today’s sophisticated cyber attackers are also amplifying this complexity as they are constantly evolving and leveraging data technology to challenge a company’s security efforts.

To establish strong, effective real-time cyber defense strategy, security teams will need to employ innovative technologies to identify threat behavioral patterns — including artificial intelligence, automation, visualisation, and big data analytics – and an agile and fluid workforce to leverage the opportunities presented by technology innovations. They should also establish policies to address privacy issues that arise out of all the personal data that are being collected. Through this combination of efforts, companies will be able to strengthen its approach to cyber defense in today’s highly connected IoT world and empower cyber defenders to help their companies better anticipate and respond to cyber attacks.

Q8. What are the main lessons you have learned in implementing Big Data Analytic projects?

Narendra Mulani: Organizations should explore the entire big data technology ecosystem, take an outcome-focused approach to addressing specific business problems, and establish precise success metrics before an analytics project even begins. The big data landscape is in a constant state of change with new data sources and emerging big data technologies appearing every day that could offer a company a new value-generating opportunity. A hybrid technology infrastructure that combines existing analytics architecture with new big data technologies can help companies to achieve superior outcomes.
An outcome-focused strategy that embraces analytics experimentation and explores the possible data and technology that can help a company meet its goals and has checkpoints for measuring performance will be very valuable, as this strategy will help the analytics team to know if they should continue on course or need to make a course correction to attain the desired outcome.

Q9. Is Data Analytics only good for businesses? What about using (Big) Data for Societal issues?

Narendra Mulani: Analytics is helping businesses across industries and governments as well to make more informed decisions for effective outcomes, whether it might be to improve customer experience, healthcare or public safety.
As an example, we’re working with a utility company in the UK to help them leverage analytics insights to anticipate equipment failures and respond in near real-time to critical situations, such as leaks or adverse weather events. We are also working with a government agency to analyze its video monitoring feeds to identify potential public safety risks.

Qx Anything else you wish to add?

Narendra Mulani: Another area that’s on the rise is Artificial Intelligence – we define it as a collection of multiple technologies that enable machines to sense, comprehend, act and learn, either on their own or to augment human activities. The new technologies include machine learning, deep learning, natural language processing, video analytics and more. AI is disrupting how businesses operate and compete and we believe it will also fundamentally transform and improve how we work and live. When an organization is pursuing an AI project, it’s our belief that it should be business-oriented, people-focused, and technology rich for it to be most effective.

———

As Chief Analytics Officer and Head Geek – Accenture Analytics, Narendra Mulani is responsible for creating a culture of analytics and driving Accenture’s strategic agenda for growth across the business. He leads a dedicated team of 17,000 Analytic professionals that serve clients around the globe, focusing on value creation through innovative solutions that combine industry and functional knowledge with analytics and technology.

Narendra has held a number of leadership roles within Accenture since joining in 1997. Most recently, he was the managing director – Products North America, where he was responsible for creating value for our clients across a number of industries. Prior to that, he was managing director – Supply Chain, Accenture Management Consulting, leading a global practice responsible for defining and implementing supply chain capabilities at a diverse set of Fortune 500 clients.

Narendra graduated from Bombay University in 1978 with a Bachelor of Commerce, and received an MBA in Finance in 1982 as well as a PhD in 1985 focused on Multivariate Statistics, both from the University of Massachusetts.

Outside of work, Narendra is involved with various activities that support education and the arts. He lives in Connecticut with his wife Nita and two children, Ravi and Nikhil.

———-

Resources

– Ducati is Analytics Driven. Analytics takes Ducati around the world at speed and precision.

Accenture Analytics. Launching an insights-driven transformation.  Download the point of view on analytics operating models to better understand how high performing companies are organizing their capabilities.

– Accenture Cyber Intelligence Platform. Analytics helping organizations to continuously predict, detect and combat cyber attacks.

–  Data Acceleration: Architecture for the Modern Data Supply Chain, Accenture

Related Posts

On Big Data and Data Science. Interview with James KobielusSource: ODBMS Industry Watch,  2016-04-19

On the Internet of Things. Interview with Colin Mahony Source: ODBMS Industry Watch, 2016-03-14

A Grand Tour of Big Data. Interview with Alan MorrisonSource: ODBMS Industry Watch, 2016-02-25

On the Industrial Internet of Things. Interview with Leon GuzendaSource: ODBMS Industry Watch,  2016-01-28

On Artificial Intelligence and Society. Interview with Oren EtzioniSource: ODBMS Industry Watch,  2016-01-15

 

Follow us on Twitter: @odbmsorg

##

]]>
http://www.odbms.org/blog/2016/05/on-data-analytics-and-the-enterprise-interview-with-narendra-mulani/feed/ 0
Recruit Institute of Technology. Interview with Alon Halevy http://www.odbms.org/blog/2016/04/recruit-institute-of-technology-interview-with-alon-halevy/ http://www.odbms.org/blog/2016/04/recruit-institute-of-technology-interview-with-alon-halevy/#comments Sat, 02 Apr 2016 15:10:02 +0000 http://www.odbms.org/blog/?p=4112

” A revolution will happen when tools like Siri can truly serve as your personal assistant and you start relying on such an assistant throughout your day. To get there, these systems need more knowledge about your life and preferences, more knowledge about the world, better conversational interfaces and at least basic commonsense reasoning capabilities. We’re still quite far from achieving these goals.”–Alon Halevy

I have interviewed Alon Halevy, CEO at Recruit Institute of Technology.

RVZ

Q1. What is the mission of the Recruit Institute of Technology?

Alon Halevy: Before I describe the mission, I should introduce our parent company Recruit Holdings to those who may not be familiar with it. Recruit (founded in 1960), is a leading “life-style” information services and human resources company in Japan with services in the areas of recruitment, advertising, employment placement, staffing, education, housing and real estate, bridal, travel, dining, beauty, automobiles and others. The company is currently expanding worldwide and operates similar businesses in the U.S., Europe and Asia. In terms of size, Recruit has over 30,000 employees and its revenues are similar to those of Facebook at this point in time.

The mission of R.I.T is threefold. First, being the lab of Recruit Holdings, our goal is to develop technologies that improve the products and services of our subsidiary companies and create value for our customers from  the vast collections of data we have. Second, our mission is to advance scientific knowledge by contributing to the research community through publications in top-notch venues. Third, we strive to use technology for social good. This latter goal may be achieved through contributing to open-source software, working on digital artifacts that would be of general use to society, or even working with experts in a particular domain to contribute to a cause.

Q2. Isn`t similar to the mission of the Allen Institute for Artificial Intelligence?

Alon Halevy: The Allen Institute is a non-profit whose admirable goal is to make fundamental contributions to Artificial Intelligence. While R.I.T strives to make fundamental contributions to A.I and related areas such as data management, we plan to work closely with our subsidiary companies and to impact the world through their products.

Q3. Driverless cars, digital Personal Assistants (e.g. Siri), Big Data, the Internet of Things, Robots: Are we on the brink of the next stage of the computer revolution?

Alon Halevy: I think we are seeing many applications in which AI and data (big or small) are starting to make a real difference and affecting people’s lives. We will see much more of it in the next few years as we refine our techniques. A revolution will happen when tools like Siri can truly serve as your personal assistant and you start relying on such an assistant throughout your day. To get there, these systems need more knowledge about your life and preferences, more knowledge about the world, better conversational interfaces and at least basic commonsense reasoning capabilities. We’re still quite far from achieving these goals.

Q4. You were for more than 10 years senior staff research scientist at Google, leading the Structured Data Group in Google Research. Was it difficult to leave Google?

Alon Halevy: It was extremely difficult leaving Google! I struggled with the decision for quite a while, and waving goodbye to my amazing team on my last day was emotionally heart wrenching. Google is an amazing company and I learned so much from my colleagues there. Fortunately, I’m very excited about my new colleagues and the entrepreneurial spirit of Recruit.
One of my goals at R.I.T is to build a lab with the same culture as that of Google and Google Research. So in a sense, I’m hoping to take Google with me. Some of my experiences from a decade at Google that are relevant to building a successful research lab are described in a blog post I contributed to the SIGMOD blog in September, 2015.

Q5. What is your vision for the next three years for the Recruit Institute of Technology?

Alon Halevy: I want to build a vibrant lab with world-class researchers and engineers. I would like the lab to become a world leader in the broad area of making data usable, which includes data discovery, cleaning, integration, visualization and analysis.
In addition, I would like the lab to build collaborations with disciplines outside of Computer Science where computing techniques can make an even broader impact on society.

Q6. What are the most important research topics you intend to work on?

Alon Halevy: One of the roadblocks to applying AI and analysis techniques more widely within enterprises is data preparation.
Before you can analyze data or apply AI techniques to it, you need to be able to discover which datasets exist in the enterprise, understand the semantics of a dataset and its underlying assumptions, and to combine disparate datasets as needed. We plan to work on the full spectrum of these challenges with the goal of enabling many more people in the enterprise to explore their data.

Recruit being a lifestyle company, another  fundamental question we plan to investigate is whether technology can help people make better life decisions. In particular, can technology help you take into consideration many factors in your life as you make decisions and steer you towards decisions that will make you happier over time. Clearly, we’ll need more than computer scientists to even ask the right questions here.

Q7. If we delegate decisions to machines, who will be responsible for the consequences? What are the ethical responsibilities of designers of intelligent systems?

Alon Halevy: You got an excellent answer from Oren Etzioni to this question in a recent interview. I agree with him fully and could not say it any better than he did.

Qx Anything you wish to add?

Alon Halevy: Yes. We’re hiring! If you’re a researcher or strong engineer who wants to make real impact on products and services in the fascinating area of lifestyle events and decision making, please consider R.I.T!

———-

Alon Halevy is the Executive Director of the Recruit Institute of Technology. From 2005 to 2015 he headed the Structured Data Management Research group at Google. Prior to that, he was a professor of Computer Science at the University of Washington in Seattle, where he founded the Database Group. In 1999, Dr. Halevy co-founded Nimble Technology, one of the first companies in the Enterprise Information Integration space, and in 2004, Dr. Halevy founded Transformic, a company that created search engines for the deep web, and was acquired by Google.
Dr. Halevy is a Fellow of the Association for Computing Machinery, received the Presidential Early Career Award for Scientists and Engineers (PECASE) in 2000, and was a Sloan Fellow (1999-2000). Halevy is the author of the book “The Infinite Emotions of Coffee”, published in 2011, and serves on the board of the Alliance of Coffee Excellence.
He is also a co-author of the book “Principles of Data Integration”, published in 2012.
Dr. Halevy received his Ph.D in Computer Science from Stanford University in 1993 and his Bachelors from the Hebrew University in Jerusalem.

Resources

– Civility in the Age of Artificial Intelligence,  by STEVE LOHR, technology reporter for The New York Times, ODBMS.org

The threat from AI is real, but everyone has it wrong, by Robert Munro, CEO Idibon, ODBMS.org

Related Posts

On Artificial Intelligence and Society. Interview with Oren Etzioni, ODBMS Industry Watch.

– On Big Data and Society. Interview with Viktor Mayer-Schönberger ODBMS Industry Watch.

Follow us on Twitter: @odbmsorg

##

]]>
http://www.odbms.org/blog/2016/04/recruit-institute-of-technology-interview-with-alon-halevy/feed/ 0
On Big Data and Society. Interview with Viktor Mayer-Schönberger http://www.odbms.org/blog/2016/01/on-big-data-and-society-interview-with-viktor-mayer-schonberger/ http://www.odbms.org/blog/2016/01/on-big-data-and-society-interview-with-viktor-mayer-schonberger/#comments Fri, 08 Jan 2016 09:06:10 +0000 http://www.odbms.org/blog/?p=4051

“There is potentially too much at stake to delegate the issue of control to individuals who are neither aware nor knowledgable enough about how their data is being used to raise alarm bells and sue data processors.”–Viktor Mayer-Schönberger.

On Big Data and Society, I have interviewed Viktor Mayer-Schönberger, Professor of Internet Governance and Regulation at Oxford University (UK).

Happy New Year!

RVZ

Q1. Is big data changing people’s everyday world in a tangible way?

Viktor Mayer-Schönberger: Yes, of course. Most of us search online regularly. Internet search engines would not work nearly as well without Big Data (and those of us old enough to remember the Yahoo menus of the 1990s know how difficult it was then to find anything online). We would not have recommendation engines helping us find the right product (and thus reducing inefficient transaction costs), nor would flying in a commercial airplane be nearly as safe as it is today.

Q2. You mentioned in your recent book with Kenneth Cukier, Big Data: A Revolution That Will Transform How We Live Work and Think, that the fundamental shift is not in the machines that calculate data but in the data itself and how we use it. But what about people?

Viktor Mayer-Schönberger: I do not think data has agency (in contrast to Latour), so of course humans are driving the development. The point we were making is that the source of value isn’t the huge computing cluster or the smart statistical algorithm, but the data itself. So when for instance asking about the ethics of Big Data it is wrong to focus on the ethics of algorithms, and much more appropriate to focus on the ethics of data use.

Q3. What is more important people`s good intention or good data?

Viktor Mayer-Schönberger: This is a bit like asking whether one prefers apples or sunshine. Good data (being comprehensive and of high quality) reflects reality and thus can help us gain insights into how the world works. That does not make such discovery ethical, even though the discover is correct. Good intentions point towards an ethical use of data, which helps protect us again unethical data uses, but does not prevent false big data analysis. This is a long way of saying we need both, albeit for different reasons.

Q4. What are your suggestion for concrete steps that can be taken to minimize and mitigate big data’s risk?

Viktor Mayer-Schönberger: I have been advocating ex ante risk assessments of big data uses, rather than (as at best we have today) ex post court action. There is potentially too much at stake to delegate the issue of control to individuals who are neither aware nor knowledgable enough about how their data is being used to raise alarm bells and sue data processors. This is not something new. There are many areas of modern life that are so difficult and intransparent for individuals to control that we have delegated control to competent government agencies.
For instance, we don’t test the food in supermarkets ourselves for safety, nor do we crash-test cars before we buy them (or Tv sets, washing machines or microwave ovens), or run our own drug trials.
In all of these cases we put in place stringent regulation that has at its core a suitable process of risk assessment, and a competent agency to enforce it. This is what we need for Big Data as well.

Q5. Do you believe is it possible to ensure transparency, guarantee human freewill, and strike a better balance on privacy and the use of personal information?

Viktor Mayer-Schönberger: Yes, I do believe that. Clearly, today we are getting not enough transparency, and there aren’t sufficiently effective guarantees for free will and privacy in place. So we can do better. And we must.

Q6. You coined in your book the terms “propensity” and “fetishization” of data. What do you mean with these terms?

Viktor Mayer-Schönberger: I don’t think we coined the term “propensity”. It’s an old term denoting the likelihood of something happening. With the “fetishization of data” we meant the temptation (in part caused by our human bias towards causality – understanding the world around us as a sequence of causes and effects) to imbue the results of Big Data analysis with more meaning than they deserve, especially suggesting that they tell us why when they only tell us what.

Q7. Can big and open data be effectively used for the common good?

Viktor Mayer-Schönberger: Of course. Big Data is at its core about understanding the world better than we do today. I would not be in the academy if I did not believe strongly that knowledge is essential for human progress.

Q8. Assuming there is a real potential in using data–driven methods to both help charities develop better services and products, and understand civil society activity. What are the key lessons and recommendations for future work in this space?

Viktor Mayer-Schönberger: My sense is that we need to hope for two developments. First, that more researchers team up with decision makers in charities, and more broadly civil society organizations (and the government) to utilize Big Data to improve our understanding of the key challenges that our society is facing. We need to improve our understanding. Second, we also need decision makers and especially policy makers to better understand the power of Big Data – they need to realize that for their decision making data is their friend; and they need to know that especially here in Europe, the cradle of enlightenment and modern science, data-based rationality is the antidote to dangerous beliefs and ideologies.

Q9. What are your current areas of research?

Viktor Mayer-Schönberger: I have been working on how Big Data is changing learning and the educational system, as well as how Big Data changes the process of discovery, and how this has huge implications, for instance in the medical field.

——————
Viktor Mayer-Schönberger is Professor of Internet Governance and Regulation at Oxford University. In addition to the best-selling “Big Data” (with Kenneth Cukier), Mayer-Schönberger has published eight books, including the awards-winning “Delete: The Virtue of Forgetting in the Digital Age” and is the author of over a hundred articles and book chapters on the information economy. He is a frequent public speaker, and his work have been featured in (among others) New York Times, Wall Street Journal, Financial Times, The Economist, Nature and Science.

Books
Mayer-Schönberger, V. and Cukier, K. (2013) Big Data: A Revolution That Will Transform How We Live, Work and Think. John Murray.

Mayer-Schönberger, V. (2009) Delete – The Virtue of Forgetting in the Digital Age. Princeton University Press.

Related Posts

Have we closed the “digital divide”, or is it just getting wider? Andrea Powell, CIO, CABI. ODBMS.org January 1, 2016

How can Open Data help to solve long-standing problems in agriculture and nutrition? BY Andrea Powell,CIO, CABI. ODBMS.org, December 7, 2015

Big Data and Large Numbers of People: the Need for Group Privacy by Prof. Luciano Floridi, Oxford Internet Institute, University of Oxford. ODBMS.org, March 2, 2015

——————
Follow ODBMS.org on Twitter: @odbmsorg.

##

]]>
http://www.odbms.org/blog/2016/01/on-big-data-and-society-interview-with-viktor-mayer-schonberger/feed/ 0
On Data Curation. Interview with Andy Palmer http://www.odbms.org/blog/2015/01/interview-andy-palmer-tamr/ http://www.odbms.org/blog/2015/01/interview-andy-palmer-tamr/#comments Wed, 14 Jan 2015 09:07:47 +0000 http://www.odbms.org/blog/?p=3644

“We propose more data transparency not less.”Andy Palmer

I have interviewed Andy Palmer, a serial entrepreneur, who co-founded Tamr, with database scientist and MIT professor Michael Stonebraker.

Happy and Peaceful 2015!

RVZ

Q1. What is the business proposition of Tamr?

Andy Palmer: Tamr provides a data unification platform that reduces by as much as 90% the time and effort of connecting and enriching multiple data sources to achieve a unified view of silo-ed enterprise data. Using Tamr, organizations are able to complete data unification projects in days or weeks versus months or quarters, dramatically accelerating time to analytics.
This capability is particularly valuable to businesses as they can get a 360-degree view of the customer, unify their supply chain data for reducing costs or risk, e.g. parts catalogs and supplier lists, and speed up conversion of clinical trial data for submission to the FDA.

Q2. What are the main technological and business challenges in producing a single, unified view across various enterprise ERPs, Databases, Data Warehouses, back-office systems, and most recently sensor and social media data in the enterprise?

Andy Palmer: Technological challenges include:
Silo-ed data, stored in varying formats and standards
– Disparate systems, instrumented but expensive to consolidate and difficult to synchronize
– Inability to use knowledge from data owners/experts in a programmatic way
– Top-down, rules-based approaches not able to handle the extreme variety of data typically found, for example, in large PLM and ERP systems.

Business challenges include:
– Globalization, where similar or duplicate data may exist in different places in multiple divisions
M&As, which can increase the volume, variety and duplication of enterprise data sources overnight
– No complete view of enterprise data assets
– “Analysis paralysis,” the inability of business people to access the data they want/need because IT people are in the critical path of preparing it for analysis

Tamr can connect and enrich data from internal and external sources, from structured data in relational databases, data warehouses, back-office systems and ERP/PLM systems to semi- or unstructured data from sensors and social media networks.

Q3. How do you manage to integrate various part and supplier data sources to produce a unified view of vendors across the enterprise?

Andy Palmer: Patent-pending technology using machine learning algorithms performs most of the work, unifying up to 90% of supplier, part and site entities by:

– Referencing each transaction and record across many data sources

– Building correct supplier names, addresses, ID’s, etc. for a variety of analytics

– Cataloging into an organized inventory of sources, entities, and attributes

When human intervention is necessary, Tamr generates questions for data experts, aggregates responses, and feeds them back into the system. This feedback enables Tamr to continuously improve its accuracy and speed.

Q4. Who should be using Tamr?

Andy Palmer: Organizations whose business and profitability depend on being able to do analysis on a unified set of data, and ask questions of that data, should be using Tamr.

Examples include:
– a manufacturer that wants to optimize spend across supply chains, but lacks a unified view of parts and suppliers.

– a biopharmaceutical company that needs to achieve a unified view of diverse clinical trials data to convert it to mandated CDISC standards for ongoing submissions to the FDA – but lacks an automated and repeatable way to do this.

– a financial services company that wants to achieve a unified view of its customers – but lacks an efficient, repeatable way to unify customer data across multiple systems, applications, and its consumer banking, loans, wealth management and credit card businesses.

– the research arm of a pharmaceutical company that wants to unify data on bioassay experiments across 8,000 research scientists, to achieve economies, avoid duplication of effort and enable better collaboration

Q5. “Data transparency” is not always welcome in the enterprise, mainly due to non-technical reasons. What do you suggest to do in order to encourage people in the enterprise to share their data?

Andy Palmer: We propose more data transparency not less.
This is because in most companies, people don’t even know what data sources are available to them, let alone have insight into them or use of them. With Tamr, companies can create a catalog of all their enterprise data sources; they can then choose how transparent to make those individual data sources, by showing meta data about each. Then, they can control usage of the data sources using the enterprise’s access management and security policies/systems.
On the business side, we have found that people in enterprises typically want an easier way to share the data sources they have built or nurtured ─ a way that gets them out of the critical path.
Tamr makes people’s data usable by many others and for many purposes, while eliminating the busywork involved.

Q6. What is Data Curation and why is it important for Big Data?

Andy Palmer: Data Curation is the process of creating a unified view of your data with the standards of quality, completeness, and focus that you define. A typical curation process consists of:

Identifying data sets of interest (whether from inside the enterprise or outside),

Exploring the data (to form an initial understanding),

Cleaning the incoming data (for example, 99999 is not a valid ZIP code),

Transforming the data (for example, to remove phone number formatting),

Unifying it with other data of interest (into a composite whole), and

Deduplicating the resulting composite.

Data Curation is important for Big Data because people want to mix and match from all the data available to them ─ external and internal ─ for analytics and downstream applications that give them competitive advantage. Tamr is important because traditional, rule-based approaches to data curation are not sufficient to solve the problem of broad integration.

Q7. What does it mean to do “fuzzy” matches between different data sources?

Andy Palmer: Tamr can make educated guesses that two similar fields refer to the same entity even though the fields describe it differently: for example, Tamr can tell that “IBM” and “International Business Machines” refer to the same company.
In Supply Chain data unification, fuzzy matching is extremely helpful in speeding up entity and attribute resolution between parts, suppliers and customers.
Tamr’s secret sauce: Connecting hundreds or thousands of sources through a bottom-up, probabilistic solution reminiscent of Google’s approach to web search and connection.
Tamr’s upside: it becomes the Google of Enterprise Data, using probabilistic data source connection and curation to revolutionize enterprise data analysis.

Q8. What is data unification and how effective is it to use Machine Learning for this?

Andy Palmer: Data Unification is part of the curation process, during which related data sources are connected to provide a unified view of a given entity and its associated attributes. Tamr’s application of machine learning is very effective: it can get you 90% of the way to data unification in many cases, then involve human experts strategically to guide unification the rest of the way.

Q9. How do you leverage the knowledge of existing business experts for guiding/ modifying the machine learning process?

Andy Palmer: Patent-pending technology using machine learning algorithms performs most of the data integration work. When human intervention is necessary, Tamr generates questions for data experts, sends them simple yes-no questions, aggregates their responses, and feeds them back into the system. This feedback enables Tamr to continuously improve its accuracy and speed.

Q10. With Tamr you claim that less human involvement is required as the systems “learns.” What are in your opinion the challenges and possible dangers of such an “automated” decision making process if not properly used or understood? Isn’t there a danger of replacing the experts with intelligent machines?

Andy Palmer: We aren’t replacing human experts at all: we are bringing them into the decision-making process in a high-value, programmatic way. And there are data stewards and provenance and governance procedures in place that control how this done. For example: in one of our pharma customers, we’re actually bringing the research scientists who created the data into the decision-making process, capturing their wisdom in Tamr. Before, they were never asked: some guy in IT was trying to guess what each scientist meant when he created his data. Or the scientists were asked via email, which, due to the nature of the biopharmaceutical industry, required printing out the emails for audit purposes.

Q11. How do you quantify the cost savings using Tamr?

Andy Palmer: The biggest savings aren’t from the savings in data curation (although these are significant), but the opportunities for savings uncovered through analysis of unified data ─ opportunities that wouldn’t otherwise have been discovered. For example, by being able to create and update a ‘golden record’ of suppliers across different countries and business groups, Tamr can provide a more comprehensive view of supplier spend.
You can use this view to identify long-tail opportunities for savings across many smaller suppliers, instead of the few large vendors visible to you without Tamr.
In the aggregate, these long-tail opportunities can easily account for 85% of total spend savings.

Q12. Could you give us some examples of use cases where Tamr is making a significant difference?

Andy Palmer: Supply Chain Management, for streamlining spend analytics and spend management. Unified views of supplier and parts data enable optimization of supplier payment terms, identification of “long-tail” savings opportunities in small or outlier suppliers that were not easily identifiable before.

Clinical Trials Management, for automated conversion of multi-source /multi-standard CDISC data (typically stored in SaS databases) to meet submission standards mandated by regulators.
Tamr eliminates manual methods, which are usually conducted by expensive outside consultants and can result in additional, inflexible data stored in proprietary formats; and provides a scalable, repeatable process for data conversion (IND/NDA programs necessitate frequent resubmission of data).

Sales and Marketing, for achieving a unified view of the customer.
Tamr enables the business to connect and unify customer data across multiple applications, systems and business units, to improve segmentation/targeting and ultimately sell more products and services.

——————–

Andy Palmer, Co-Founder and CEO, Tamr Inc.

Andy Palmer is co-founder and CEO of Tamr, Inc. Palmer co-founded Tamr with fellow entrepreneur Michael Stonebraker, PhD. Previously, Palmer was co-founder and founding CEO of Vertica Systems, a pioneering big data analytics company (acquired by HP). During his career as an entrepreneur, Palmer has served as founder, founding investor, BOD member or advisor to more than 50 start-up companies. He also served as Global Head of Software Engineering and Architecture at Novartis Institutes for BioMedical Research (NIBR) and as a member of the start-up team and Senior Vice President of Operations and CIO at Infinity Pharmaceuticals (NASDAQ: INFI). He earned undergraduate degrees in English, history and computer science from Bowdoin College, and an MBA from the Tuck School of Business at Dartmouth.
————————–
-Resources

Data Science is mainly a Human Science. ODBMS.org, October 7, 2014

Big Data Can Drive Big Opportunities, by Mike Cavaretta, Data Scientist and Manager at Ford Motor Company. ODBMS.org, October 2014.

Big Data: A Data-Driven Society? by Roberto V. Zicari, Goethe University, Stanford EE Computer Systems Colloquium, October 29, 2014

-Related Posts

On Big Data Analytics. Interview with Anthony Bak. ODBMS Industry Watch, December 7, 2014

Predictive Analytics in Healthcare. Interview with Steve Nathan. ODBMS Industry Watch, August 26, 2014

-Webinar
January 27th at 1PM
Webinar: Toward Automated, Scalable CDISC Conversion
John Keilty, Third Rock Ventures | Timothy Danford, Tamr, Inc.

During a one-hour webinar, join John Keilty, former VP of Informatics at Infinity Pharmaceuticals, and Timothy Danford, CDISC Solution Lead for Tamr, as they discuss some of the key challenges in preparing clinical trial data for submission to the FDA, and the problems associated with current preparation processes.

Follow ODBMS.org on twitter: @odbsmorg

]]>
http://www.odbms.org/blog/2015/01/interview-andy-palmer-tamr/feed/ 0