ODBMS Industry Watch » privacy http://www.odbms.org/blog Trends and Information on Big Data, New Data Management Technologies, Data Science and Innovation. Sun, 02 Apr 2017 17:59:10 +0000 en-US hourly 1 http://wordpress.org/?v=4.2.13 Challenges and Opportunities of The Internet of Things. Interview with Steve Cellini http://www.odbms.org/blog/2015/10/challenges-and-opportunities-of-the-internet-of-things-interview-with-steve-cellini/ http://www.odbms.org/blog/2015/10/challenges-and-opportunities-of-the-internet-of-things-interview-with-steve-cellini/#comments Wed, 07 Oct 2015 00:01:17 +0000 http://www.odbms.org/blog/?p=4008

“The question of ‘who owns the data’ will undoubtedly add requirements on the underlying service architecture and database, such as the ability to add meta-data relationships representing the provenance or ownership of specific device data.”–Steve Cellini

I have interviewed Steve Cellini, Vice President of Product Management at NuoDB. We covered the challenges and opportunities of The Internet of Things, seen from the perspective of a database vendor.


Q1. What are in your opinion the main Challenges and Opportunities of The Internet of Things (IoT) seen from the perspective of a database vendor?

Steve Cellini: Great question. With the popularity of Internet of Things, companies have to deal with various requirements, including data confidentiality and authentication, access control within the IoT network, privacy and trust among users and devices, and the enforcement of security and privacy policies. Traditional security counter-measures cannot be directly applied to IoT technologies due to the different standards and communication stacks involved. Moreover, the high number of interconnected devices leads to scalability issues; therefore a flexible infrastructure is needed to be able to deal with security threats in such a dynamic environment.

If you think about IoT from a data perspective, you’d see these characteristics:
• Distributed: lots of data sources, and consumers of workloads over that data are cross-country, cross-region and worldwide.
• Dynamic: data sources come and go, data rates may fluctuate as sets of data are added, dropped or moved into a locality. Workloads may also fluctuate.
• Diverse: data arrives from different kinds of sources
• Immediate: some workloads, such as monitoring, alerting, exception handling require near-real-time access to data for analytics. Especially if you want to spot trends before they become problems, or identify outliers by comparison to current norms or for a real-time dashboard.
These issues represent opportunities for the next generation of databases. For instance, the need for immediacy turns into a strong HTAP (Hybrid Transactional and Analytic Processing) requirement to support that as well as the real-time consumption of the raw data from all the devices.

Q2. Among the key challenge areas for IoT are Security, Trust and Privacy. What is your take on this?

Steve Cellini: IoT scenarios often involve human activities, such as tracking utility usage in a home or recording motion received from security cameras. The data from a single device may be by itself innocuous, but when the data from a variety of devices is combined and integrated, the result may be a fairly complete and revealing view of one’s activities, and may not be anonymous.

With this in mind, the associated data can be thought of as “valuable” or “sensitive” data, with attendant requirements on the underlying database, not dissimilar from, say, the kinds of protections you’d apply to financial data — such as authentication, authorization, logging or encryption.

Additionally, data sovereignty or residency regulations may also require that IoT data for a given class of users be stored in a specific region only, even as workloads that consume that data might be located elsewhere, or may in fact roam in other regions.

There may be other requirements, such as the need to be able to track and audit intermediate handlers of the data, including IoT hubs or gateways, given the increasing trend to closely integrate a device with a specific cloud service provider, which intermediates general access to the device. Also, the question of ‘who owns the data’ will undoubtedly add requirements on the underlying service architecture and database, such as the ability to add meta-data relationships representing the provenance or ownership of specific device data.

Q3. What are the main technical challenges to keep in mind while selecting a database for today’s mobile environment?

Steve Cellini: Mobile users represent sources of data and transactions that move around, imposing additional requirements on the underlying service architecture. One obvious requirement is to enable low-latency access to a fully active, consistent, and up-to-date view of the database, for both mobile apps and their users, and for backend workloads, regardless of where users happen to be located. These two goals may conflict if the underlying database system is locked to a single region, or if it’s replicated and does not support write access in all regions.

It can also get interesting when you take into account the growing body of data sovereignty or residency regulations. Even as your users are traveling globally, how do you ensure that their data-at-rest is being stored in only their home region?

If you can’t achieve these goals without a lot of special-case coding in the application, you are going to have a very complex, error-prone application and service architecture.

Q4. You define NuoDB as a scale-out SQL database for global operations. Could you elaborate on the key features of NuoDB?

Steve Cellini: NuoDB offers several key value propositions to customers: the ability to geo-distribute a single logical database across multiple data centers or regions, arbitrary levels of continuous availability and storage redundancy, elastic horizontal scale out/in on commodity hardware, automation, ease and efficiency of multi-tenancy.
All of these capabilities enable operations to cope flexibly, efficiently and economically as the workload rises and dips around the business lifecycle, or expands with new business requirements.

Q5. What are the typical customer demands that you are responding to?

Steve Cellini: NuoDB is the database for today’s on-demand economy. Businesses have to respond to their customers who demand immediate response and expect a consistent view of their data, whether it be their bank account or e-commerce apps — no matter where they are located. Therefore, businesses are looking to move their key applications to the cloud and ensure data consistency – and that’s what is driving the demand for our geo-distributed SQL database.

Q6. Who needs a geo-distributed database? Could you give some example of relevant use cases?

Steve Cellini: A lot of our customers come to us precisely for our geo distributed capability – by which I mean our ability to run a single unified database spread across multiple locations, accessible for querying and updating equally in all those locations. This is important where applications have mobile users, switching the location they connect to. That happens a lot in the telecommuting industry. Or they’re operating ‘follow the sun’ services where a user might need to access any data from anywhere that’s a pattern with global financial services customers. Or just so they can offer the same low-latency service everywhere. That’s what we call “local everywhere”, which means you don’t see increasing delays, if you are traveling further from the central database.

Q7. You performed recently some tests using the DBT2 Benchmark. Why are you using the DBT2 Benchmark and what are the results you obtained so far?

Steve Cellini: The DBT2 (TPC/C) benchmark is a good test for an operational database, because it simulates a real-world transactional workload.
Our focus on DBT2 hasn’t been on achieving a new record for absolute NOTPM rates, but rather to explore one of our core value propositions — horizontal scale out on commodity hardware. We recently passed the 1 million NOTPM mark on a cluster of 50 low-cost machines and we are very excited about it.

Q8. How is your offering in the area of automation, resiliency, and disaster recovery different (or comparable) with some of the other database competitors?

Steve Cellini: We’ve heard from customers who need to move beyond the complexity, pain and cost of their disaster recovery operations, such as expanding from a typical two data center replication operation to three or more data centers, or addressing lags in updates to the replica, or moving to active/active.

With NuoDB, you use our automation capability to dynamically expand the number of hosts and regions a database operates in, without any interruption of service. You can dial in the level of compute and storage redundancy required and there is no single point of failure in a production NuoDB configuration. And you can update in every location – which may be more than two, if that’s what you need.

Steve Cellini VP, Product Management, NuoDB
Steve joined NuoDB in 2014 and is responsible for Product Management and Product Support, as well as helping with strategic partnerships.

In his 30-year career, he has led software and services programs at various companies – from startups to Fortune 500 – focusing on bringing transformational technology to market. Steve started his career building simulator and user interface systems for electrical and mechanical CAD products and currently holds six patents.

Prior to NuoDB, Steve held senior technical and management positions on cloud, database, and storage projects at EMC, Mozy, and Microsoft. At Microsoft, Steve helped launch one of the first cloud platform services and led a company-wide technical evangelism team. Steve has also built and launched several connected mobile apps. He also managed Services and Engineering groups at two of the first object database companies – Ontos (Ontologic) and Object Design.

Steve holds a Sc.B in Engineering Physics from Cornell University.


DBT-2 Clone from SourceForge

Setting up DBT-2 for NuoDB, Github

One Million NOTPM DBT2 Benchmark on NuoDB 2.3 By Dai Klegg, NuoDB, Sr Director of Product Marketing. ODBMS.org

Hybrid Transaction and Analytical Processing with NuoDB. Technical Whitepaper, NuoDB. ODBMS.org

Related Posts

Big Data, Analytics, and the Internet of Things. Mohak Shah, analytics leader and research scientist at Bosch Research, USA, ODBMS.org

SMART DATA: Running the Internet of Things as a Citizen Web. by Dirk Helbing , ETH Zurich. ODBMS.org

On Big Data and the Internet of Things. Interview with Bill Franks. ODBMS Industry Watch, March 9, 2015

Follow ODBMS.org on Twitter: @odbmsorg


http://www.odbms.org/blog/2015/10/challenges-and-opportunities-of-the-internet-of-things-interview-with-steve-cellini/feed/ 0
The other side of Big Data. Interview with Michael L. Brodie. http://www.odbms.org/blog/2014/04/side-big-data-interview-michael-l-brodie/ http://www.odbms.org/blog/2014/04/side-big-data-interview-michael-l-brodie/#comments Sat, 26 Apr 2014 09:06:19 +0000 http://www.odbms.org/blog/?p=3222

“So it is not even the volume of data that imputes political or economic value. Hence, it is clear that data has enormous political and economic value. Given the increasing digitization of our world it seems inevitable that our legal, economic, and political systems, amongst others, will ascribe to formal measures of value for data.” –Michael L. Brodie.

What is the other side of Big Data? What are the societal benefits, risks, and values of Big Data? These are difficult questions to answer.
On this topic, I have interviewed Dr. Michael L. Brodie, Research Scientist at MIT Computer Science and Artificial Intelligence Laboratory. Dr. Brodie has over 40 years experience in research and industrial practice.


Q1. You recently wrote [7] that “we are in the midst of two significant shifts – the shift to Big Data requiring new computational solutions, and the more profound shift in societal benefits, risks, and values”. Can you please elaborate on this?

Michael L. Brodie: The database world deals with data that is bounded, even if vast and growing beyond belief, and used for known, discrete models of our world most of which support a single version of truth. While Big Data expands the existing scale (volume, velocity, variety) it does far more as it takes us into a world that we experience in life but not in computing. I call the vision, the direction that Big Data is taking us, Computing Reality. A simple explanation is that in the database world, we work top-down with schemas that define how the data should behave. For example, Telecom billing systems are essentially all in an equivalence class of the same billing model and require that billing data conform. Billing databases have a single version of truth so that telecom bills have justifiable charges. Not so with Big Data.

If we impose a model or our biases on the data we may prelude the very value that we are trying to discover.
In Big Data worlds, as in life, there is not a single version of truth over the data but multiple perspectives each with a probability of being true or reasonable. We are probably not looking for one likely model but an ensemble of models each of which provides a different perspective and discloses some discoveries in the data that we otherwise would not have found.
So the one paradigm shift is from small data that involve discrete, bounded, top-down approaches to computing to big data that require bottom-up approaches that tend to be vague or probabilistic, unbounded, and provide support multiple perspectives. I call this latter approach Computing Reality, reflecting the vagueness and unboundedness of reality.

A second, related shift – from why to what – can be understood in terms of Scientific Discovery. The history of scientific and Western thought, starting before Aristotle and Plato, has matured into what we know today as the Scientific Method in which one makes observations about a phenomenon, e.g., sees some data, hypothesizes a model, and determines if the model makes sense over the observed data.

This process is What: What are the correlations in the data that might explain the phenomenon.
A reasonable model over the data leads to Why: the Holy Grail of Science – causation – Why does the phenomenon occur.

For over 2,000 years a little What has guided  Why – Scientific Discovery through empiricism.
 Big Data has the potential of turning scientific discovery on its ear. Big Data is leading to a shift from Why to What.
The value of Big Data and the emergence of Big Data analytics may shift the preponderance of scientific discovery to What, since it is so much cheaper that Why – clinical studies that take vast resources and years of careful work. Here is the challenge. 
Why – causation – cannot be deduced from What. It is not clear that Big Data practitioners understand the tenuous link between What and Why. Massive Big data blunders [1, 2] suggest that this is the case.
My research into Computing Reality explores this link with the hopes of providing guidance for Big Data tools and techniques. And even cooler than that to accelerate Scientific Discovery by adding mechanisms and metrics of veracity to Big Data and its symbiosis with empericism

Q2. You also wrote [7] that with Big Data “more than ever before, technology is far ahead of the law”. What do you mean with this?

Michael L. Brodie: The Forth Amendment of the United States Constitution was tested many times by technology including when electronic techniques could be used to determine activities inside a citizen’s home. When the constitution was written in 1787 electronic surveillance could never have been anticipated.

Today, the laws of search and seizure, based on the Fourth Amendment, permit those with a warrant to acquire all of your electronic devices so the government can examine everything on those devices although it appears that the intent of the law was to permit search and seizure of evidence relative to the suspected offence. That is, the current laws were perfectly rational when written; however, technology has so changed the world we live in that the law, interpreted simply allows the government to look at your entire digital life, which for many of us is much of our lives thus minimizing or eliminating the protections of the Fourth Amendment. The simple matter is that technology will always be ahead of the law.
So we must constantly balance current and unforeseen consequences of technology advance on our lives and societies.
Since time immemorial, and as observed by Benjamin Franklin, we must always judiciously balance freedom and security; you can’t have both. Technology more than many domains push this balance.

Q3. John Podesta, Obama’s Counselor and study lead, asked the following question during a workshop[4]: „Does our legal privacy framework support and balance safety and freedom?“ 
What is your personal view on this especially related to the ongoing discussion on an open and free Internet and Big Data?

Michael L. Brodie: What a great question, worth of serious pursuit, more than I will pursue here. A fundamental part of your question is of a free and open Internet.
While it is debatable as to whether computing or the Internet has created economic growth and increased productivity, it is fair to say that our economies have become so dependent on computing and that Balkanizing the Internet, as exemplified recently by Turkey, China, Brasil, and even Switzerland, will surely cause major economic disruption.

Not only does a significant portion of our existing economy ride on an currently open and free Internet, that platform has been and will continue to be a fountain of innovation and potential economic growth, and, ideally, increased productivity; not to mention the daily lives of billions of people on the planet. As we have seen in Tunisia, Egypt, North Korea, China, Syria, and other constrained countries, an open and free Internet, e.g., Twitter, is becoming a means for democratic expression and constraint on totalitarian behaviour. Much is at stake to maintain an open and free Internet.

This should encourage a robust debate of the various Internet Bill of Rights [8]currently on offer. Clearly the Snowden-NSA incidents and the resulting events in the White House, the Supreme Court, and the US Congress clearly indicate that our legal privacy framework is inadequate. The more interesting question is what changes are required to permit a balance of freedom and safety. Such a framework should   result from a robust, informed public debate on the issues. Hopefully these discussions will start in earnest. The workshop is an example of the White House’s commitment to such a discussion.

Q4. What would be the discussion on an open and free Internet, while balancing safety with freedom, if Edward Snowden had not disclosed the NSA surveillance?

Michael L. Brodie: What great questions with profound implications, clearly beyond my skills, but fun to poke at. Let me add to the question: Is Snowden a Whistle Blower or terrorist? Is he working to uphold the constitution or undermine it?

I happen to have had some direct experience on this issue. From April 2006 to January 2008 I served on the United States of America National Academies Committee on Technical and Privacy Dimensions of Information for Terrorism Prevention and other National Goals, co-chaired by Dr. Charles Vest, president of the National Academy of Engineering and Dr. William Perry, former US Secretary of Defense, that was commissioned by the Department of Homeland Security and the National Science Foundation.

The recent White House Investigation prompted by Snowden’s disclosures heavily cited the commission’s report [3].
The 21-month investigation by 20 experts chosen by the academy uncovered some aspects of what Snowden’s disclosures led to, it did not uncover the scope and scale of the NSA actions that emerged from Snowden’s disclosures.
It is not until you discover the actions that you question the relevant laws or as the White House justifiably asked, the legal privacy framework to support and balance safety and freedom.

As I said in the piece that you reference the White House and Snowden are asking exactly the same questions. Snowden has said that he saw it as his obligation to do what he did given his oath to uphold the constitution. Hence, such a discussion could emerge without Snowden in the next decade, but it would not have emerged at the moment without his actions.

Would that it had emerged in 2006 or as a consequence of the many other similarly intended investigations.
It seems to me that Snowden blew the whistle on NSA.

Q5. De facto, the Internet is becoming a new battlefield among different political and economic systems in the world. What is the political and economic value of data?

Michael L. Brodie: Again a grand question for my betters. This is another profound question that I am not skilled to answer. But why let that stop me?
Our economic system is based on commodities, goods and services, with almost no means of attributing economic value to data. Indirectly, data is valued at inconceivably high values according to many Internet company acquisitions, especially Facebook’s recent $16 Billion acquisition of Whatsapp that appears to be acquiring people and their data by the network effect.
How do you ascribe value to data? Who owns data? 
Does it age and does time reduce or raise its value? If it has economic value, then what legal jurisdiction governs data? What is the political value of data?
For one example look at Europe’s solicitation of business away from the United States based on data, data ownership, and data governance.

Another example is that President Lyndon Johnson achieved the US Civil Rights Bill because of data – he knew where all the bodies were buried. What is the value of data there?

So it is not even the volume of data that imputes political or economic value. Hence, it is clear that data has enormous political and economic value. Given the increasing digitization of our world it seems inevitable that our legal, economic, and political systems, amongst others, will ascribe to formal measures of value for data.

Q6. There has been a claim that “Big data” has rendered obsolete the current approach to protecting privacy and civil liberties [5]. Is this really so?

Michael L. Brodie: Without question expanding beyond bounded, discrete, top-down models of the world, to a vastly larger, more complex digital version of the world, requires a reevaluation of previous approaches to computing problems, including privacy and civil liberties. The quote is from Craig Mundie [5] who makes the observation for a policy and strategy point of view. A recent report on machine learning and curly fries claims that organizations, e.g., marketing, can create complete profiles of individuals without their permission and presumably use it in many ways, e.g., refuse providing a loan? Does that threaten privacy and civil liberties?

While I quoted Mundie concerning civil liberties, my knowledge is in computing and databases. My reference concerns the fact that current solutions will simply not scale to the world of Big Data and Computing Reality.
It seems a safe statement since Butler Lampson and Mike Stonebraker have both said the same thing. Simply stated, we cannot anticipate every attack, what combination of data accesses could be used to deduce private information.
A famous case is to use Netflix movie selection data to identify private patient information from anonymized Medicare data. So while you may do a top-down job applying existing protection mechanisms, your only hope is to detect violations and stop further such attacks, as has been claimed for Heartbleed.

As Butler Lampson said [6]

“It’s time to change the way we think about computer security: instead of trying to prevent security breaches, we should focus on dealing with them after they happen.
Today computer security depends on access control, and it’s been a failure. Real world security, by contrast, is mainly retroactive: the reason burglars don’t break into my house is that they are afraid of going to jail, and the financial system is secure mainly because almost any transaction can be undone.
There are many ways to make security retroactive:
• Track down and punish offenders.
• Selectively undo data corruption caused by malware.
Require applications and online services to respect people’s ownership of their personal data.
Access control is still needed, but it can be much more coarse-grained, and therefore both more reliable and less intrusive. Authentication and auditing are the most important features. Retroactive security will not be perfect, but perfect security is not to be had, and it will be much better than what we have now.”


[1]  D. Leinweber. Stupid data miner tricks: How quants fool themselves and the economic indicator 
in your pants. Forbes, July 2012.

[2]  G. Marcus and E. Davis. Eight (no, nine!) problems with big data. New York Times, August 

[3]    Protecting Individual Privacy in the Struggle Against Terrorism: A Framework for Program Assessment, Committee on Technical and Privacy Dimensions of Information for Terrorism Prevention and Other National Goals, National Research Council, Washington, D.C. 2008. ISBN-10: 0-309-12488-3 ISBN-13: 978-0-309-12488-1

[4]    John Podesta, White House Counselor, White House-MIT Big Data Privacy Workshop: Advancing the State of the Art in Technology and Practice, March 4, 2014, MIT, Cambridge, MA http://web.mit.edu/bigdata-priv/agenda.html

[5]    Craig Mundie, Privacy Pragmatism: Focus on Data Use, Not Data Collection, Foreign Affairs, March/April 2014.

[6]    Butler Lampson, Retroactive Security Microsoft Research, and MIT, New England Database Summit 2014, MIT, Cambridge, MA, January 31, 2014

[7]    White House-MIT Big Data Privacy Workshop
A Personal View, 
 Dr. Michael L. Brodie
, Computer Science and Artificial Intelligence Laboratory, MIT
, March 24, 2014 http://www.odbms.org/2014/04/white-house-mit-big-data-privacy-workshop/

[8]    Tim Berners-Lee, Online Magna Carta, aka Internet Bill of Rights, aka a Global Constitution.

Dr. Michael L. Brodie
Dr. Brodie has over 40 years experience in research and industrial practice in databases, distributed systems, integration, artificial intelligence, and multi-disciplinary problem solving. He is concerned with the Big Picture aspects of information ecosystems including business, economic, social, application, and technical.
Dr. Brodie is a Research Scientist, MIT Computer Science and Artificial Intelligence Laboratory; advises startups; serves on Advisory Boards of national and international research organizations; and is an adjunct professor at the National University of Ireland, Galway. For over 20 years he served as Chief Scientist of IT, Verizon, a Fortune 20 company, responsible for advanced technologies, architectures, and methodologies for Information Technology strategies and for guiding industrial scale deployments of emergent technologies, most recently Cloud Computing and Big Data and start ups Jisto.com and data-tamer.com. He has served on several National Academy of Science committees.
Dr. Brodie holds a PhD in Databases from the University of Toronto
 and a Doctor of Science (honoris causa) from the National University of Ireland.


– “White House-MIT Big Data Privacy Workshop[1] A Personal View”
By Dr. Michael L. Brodie, Computer Science and Artificial Intelligence Laboratory, MIT, March 24, 2014

– “What versus Why Towards Computing Reality”
Michael L. Brodie, Jennie Duggan, Computer Science and Artificial Intelligence Laboratory (CSAIL) | MIT April 17, 2014

Follow ODBMS.org on Twitter: @odbmsorg

http://www.odbms.org/blog/2014/04/side-big-data-interview-michael-l-brodie/feed/ 0