The Perils of Big Data
The Perils of Big Data
By Ramesh Chitor, Senior Business Development Manager, Cisco Systems.
July 16, 2014- ODBMS.org Expert Article.
Q: Where are we in the evolution of Big Data technologies?
A: You don’t frequently see open disagreement between experts on where we are right now in the technology explosion, so it is intriguing to find Geoffrey Moore, the marketing expert and author of the runaway bestseller, Crossing the Chasm (of IT innovation), crossing comments with Derrick Harris, a prolific technology journalist at Gigaom, the IT research group, earlier this year.
After Harris had waxed eloquently in May about IT being at “a real turning point for Big Data” and referring to sensors and quantum computing, Moore jumped in to comment, “I love the technology explosion and the fountain of ideas it is unleashing, but given the amount of disruption and make-it-yourself going on, we are a long, long way from even reaching the chasm (in the technology adoption lifecycle between early adopters and the early majority, explained in his book) much less crossing it. This is still the Early Market, a time for visionary sponsors to underwrite game-changing projects to garner dramatic competitive advantage.”
To which Harris responded, “Well, far be it from me to disagree with the guy who wrote the book on crossing chasms 😉 I don’t know the exact time frame (commercial quantum computing, for example, is a few years off, at least), but I do think the applications for big data are becoming much more clear, and the technologies are easier to consume and able to do more of what mainstream companies will expect. As the tech gets commercialized, companies will be able to move from idea to pilot pretty easily.”
Polarities apart, there’s common ground that dealing with Big Data involves escalating volume, velocity, variety and veracity (the so-called four V’s) of information representing the major business challenge for the remainder of this decade, if not beyond.
Q: We’re in a period of unstoppable information explosion due to the Internet and the proliferation of devices connected to the Internet. What are the main implications for data management?
A: The rise of what is known as Internet of Everything (IoE), as Cisco has dubbed it, means we’re awash in data as the New York Times put it or, as IDC researchers estimate worldwide digital data will grow from 2.7 zettabytes (a zettabyte [ZB] equals about 250 billion DVDs, said Cisco’s Thomas Barnett, JR in 2011) to 35 ZB in 2020.
Before that time, in 2015, according to Barnett, video — for example, Internet video to PC, Internet video to TV, and mobile video — will create 61 percent of global Internet traffic that will bring new considerations for storage and retrieval.
The way to effectively manage all this, many analysts agree, is to store and retrieve it efficiently. To address this, a growing number of purpose built storage systems and excessively fast data crunching machines are appearing. But, apart from the quantity of data being created (some say it’s doubling every two years), another aspect to manage are the sources of data (external, the Internet and digital) and the internal ones like transactional and operating systems supporting many marketing, sales fulfillment, and services as well as an organization’s individual business intelligence reporting.
Beyond these are choosing the right sensors (in smart cars, for example) to deliver the needed data.
Organizations will also be looking to develop customized offers and solutions by pulling medical data in from individuals, public sources and research reports (obtained from wearable technologies, for example). Marketers will also look to data from devices, maps, and GPS signals, smartphone check-ins, social networks and content sites.
Interpreting and analyzing all this meaningfully is driving analytics as the differentiator. In a growing number of industries analytics is what separates winners from losers; in important decisions to rely on traditional assumptions can be counter productive, what’s needed to get competitive advantage are insights into what is really happening, frequently contradicting those assumptions.
An example is the Oakland A’s, the Major League Baseball team, researched by Michael Lewis for his insightful 2003 book, Moneyball, later made into a movie. As with a slew of industries built on conventional thinking, the A’s used long-established criteria to value players. That is, until Billy Beane, the general manager, applied analytics to fault many of those assumptions. The A’s redefined their outmoded benchmarks to re-evaluate overlooked players and hire some to create a championship quality team for one third of the budget of the New York Yankees — $41 million to $125 million — with equal wins and places in the division and 50 fewer runs allowed. A clear case of competitive advantage for the A’s.
“Business analytics is the key,” say the authors of an article in the Ozean Journal of Social Sciences in 2012. “By bringing optimized performance, informed decisions, actionable insights and information, business analytics can help your company outperform and outmanoeuver the competition and answer fundamental questions such as, ‘How are we doing?’, ‘Why?’ and, ‘What should we be doing?’”
Gerald Naidoo, CEO of Logikal Consulting, also told the journal that, “Today’s algorithms and powerful computers can reveal new insights that would previously have remained hidden, but one in three managers frequently make critical decisions without the information they need, one in two don’t have access to the information across their company needed to do their jobs, and three in four business leaders say more predictive information would drive better decisions.”
Incoming data that is immediate and constant means organizations will need to track complex events, keep up with news breaks, spot trends, analyze emerging trends — for example, the effect of the Texas drought on beef prices, or a rising incidence of Google queries about flu implying a regional outbreak — in real time. Organizations that can respond quickly through commonality relationships, for example, will enjoy competitive advantage.
Cloud computing can provide the scalability, and agility that are key to managing massive amounts of data that companies seek to mine, without investing millions of dollars on infrastructure, and paying for data services as they go. On top of this, data storage costs are notably lower.
The drive to cloud computing is to do with velocity — one of the four V’s that companies must get a handle on for processing and solving Big Data issues. Early majority organizations will require a much faster way to deliver IT infrastructure applications and cloud offers a way for the capacity to be delivered.
As the early majority of information providers follow the early adopters across the chasm and prepare to face the Big Data evolution, they will need to build the necessary technology stacks.
Some of the necessary components in this architecture stack, according to Cisco, will include the Hadoop Distributed File System (HDFS) [Hadoop is a well know open source framework for Big Data distributed applications created in 2005] to efficiently process massive amount of data by moving computing to the data; hybrid non-relational databases to provide flexible schema, true scalability and awareness of multiple data centers; and next-generation databases that are non-relational, distributed, open-source and horizontally scalable.
Using service-oriented architecture (SOA), these efficiencies will enhance unstructured and structured data management and, by integrating widely disparate applications for a Web-based environment and using multiple implementation platforms, they will prepare providers for the future.
Data warehouse appliances and utilities that extend beyond a traditional data warehouse, providing robust business intelligence will be valued for being affordable, easy, powerful and optimized for intensive analytical analysis and performance with quick time-to-value.
The Hadoop ecosystem is not a direct replacement for online analytical processing (OLAP) or for the Enterprise Data Warehouse (EDW) systems that have been around since the 1990s. Hadoop is more disruptive — and cheaper — than a replacement for these technologies that will continue to have their place. There is overlap, certainly, but one platform can do things that the other can’t and vice versa. In other words, there is choice here for for co-existing.
Entering the Internet of Everything to allow people to remotely control interfaces to access multiple needs and wants also means opening up to abusers. Organizations must put security and privacy at the top of the list of concerns to prevent ID loss, sabotage, theft, denials of service, and cyberspying.
What all this adds up to is that organizations can take advantage of the opportunities available in Big Data only when they have the processes and solutions that can embrace the four V’s — volume, variety, velocity and veracity.
Q: Big Data is certainly big, but is it opening too many doors? What about handling fraud and security and privacy issues?
A: Ian Urbina, a New York Times investigative reporter, wrote in January, 2014, “The perpetual cat-and-mouse game between computer hackers and their targets is getting nastier.” He went on to describe broken firewalls and antivirus programs, cyberspying by the Chinese Army that stole millions of dollars worth of data from military contractors and research companies.
Late last year there was the Target credit card attack involving the theft of details from up to 70 million accounts. That same year, companies spent about $1.3 billion on insurance to help cover expenses associated with data theft.
The problem in this post Edward Snowden world, is that people seek to satisfy conflicting IT imperatives: maximum security and maximum privacy.
In a June, 2014 McKinsey & Company article, the authors drew attention to the fact that while companies are aware of the growing risk of cyberattacks, few are doing the necessary to protect private information.
There are multiple elements to securing the Internet of Everything, but here are some to consider:
Companies must accept a certain level of cyberattack risk by making sophisticated trade-offs between risks and customer expectations. The fact is that it’s not possible to keep the Internet of Everything completely secure, so systems must be designed assuming that anything can be compromised. There must be a zero trust model at all points of the system. The experience learned from protecting the edges of enterprises with firewalls is not enough. Organization need to be able to quickly recognize when a breach has occurred and stop it before it can cause more damage.
Product development decisions frequently raise the amount of sensitive customer information collected,while procurement decisions can create the risk that vendors will treat critical intellectual property too carelessly
User behavior is hard to change: connected devices can be the weakest link. Device security starts with making them tamper-resistant and encrypting the information in them. SMS-based two factor authentication (2FA)- or the process where a user is asked to register the mobile phone number and confirm identity via a One-Time Password (OTP) is becoming the security “go-to” method. In a recent survey, 90 percent of global IT managers said they plan to or are considering adopting the use of OTPs in 2014 to validate end-users, increase conversion rates and better the customer experience.
Even a simple sensor can be used to surmount firewalls. Hackers have many ways to figure out private information (like listening in on a smart energy meter to surmise that a home occupant is out), or even to infiltrate entire networks.
Social networking services like Google, Facebook, and Twitter record granular details of individuals that have the potential to be misused.
Big Data us important in helping predict epidemics and, the early detection necessary to minimize impact whether its flu, or cholera. Much of the data used for this is private personal information sand needs protection.
Big Data encourages the indiscriminate assembly and over assembly of data. While data offers new insights, the opportunity of creating new services and products, one of the first rules of good data hygiene is don’t assemble and hoard unneeded data, especially the private stuff. In the Big Data age does this rule go by the board.? Is there a risk of limiting the potential of Big Data by limiting the data collected?
Q. Is there a need to create a new role inside the organization responsible for assessing privacy risks? How do organizations allow users to manage and control their own data?
Larry Page, Google CEO, reacted as reported in the Financial Times saying this risked damaging the next generation of Internet start-ups and boosted repressive governments seeking to restrict online communications.
In late June, 2014 Al Jazeera reported that Google had removed some search results in answer to the European Union’s ruling backing citizens’ rights to have objectionable personal information about them hidden from search engines.
Several weeks after the May ruling by the European Court of Justice on the so-called “right to be forgotten,” Google set up an online interface for users to register their complaints.
Al Jazeera reported that national governments recently moved toward extending Europe’s strict data protection rules to all companies, not just European ones.
Good information governance will only exist when a business knows what data it collects, who and where the data comes from, where it is stored, how it is used and what it is used for?
Q: Are we at the beginning of a wireless and mobile revolution? What are the implications?
A: Cellphones are ubiquitous. In the terms of the technology adoption lifecycle created by Geoffrey Moore in his runaway bestseller, Crossing the Chasm: Marketing and Selling Disruptive Products to Mainstream Customers we’re probably in the beginning Early Majority segment of the mobile revolution (the third of five segments with the Late Majority and Laggards still to come).
With wireless technology estimated to have penetrated 90% of the U.S. Market the revolution, clearly, has already happened. It will continue apace. With a penetration level of 133% in industrialized countries, and estimates that this will reach 500%, it seems we’ll eventually be knee-deep in cell phones. The U.S. then, has still some way to go. Don’t take these numbers to mean that everyone but the very young and very old in future will carry five cell phones. They mean, in fact, that in time five devices and sensors per person will communicate through wireless networks.
As driverless cars, refrigerators, homes, dogs, hospital patients, videos and tweets come on line in vastly increasing numbers the growth rate of mobile data is projected by Cisco to be 18-fold higher over the next two years, reaching 10.8 exabytes per month by 2016. An exabyte is a unit of information or computer storage equal to 1 quintillion bytes.
That said, the enterprise is said by analysts to be the greatest under served opportunity in mobile. With a service supported only by a wireless network architecture, mobile will have a notable impact on business productivity and in creating value once it takes off.
Managed effectively, machine-generated data offers a goldmine of intelligence that businesses can use to gain perspectives into subscriber behavior and customer churn, for example, and to improve service quality and billing accuracy. But if organizations can’t handle the volume, dealing with telecommunications data could be an expensive drain of resources. A smart data management approach, especially when it comes to database selection, can have a big impact on the ability of a content service provider (CSP) to compete and thrive in the fast-paced and continually evolving telecommunications market.
Cisco sees an an annual run rate of 130 exabytes of mobile data traffic by 2016, equivalent to:
- 33 billion DVDs.
- 4.3 quadrillion MP3 files (music/audio).
- 813 quadrillion short message service (SMS) text messages.
This increase represents a compound annual growth rate (CAGR) of 78 percent over the period. The incremental amount of traffic being added to the mobile Internet between 2015 and 2016 alone is approximately three times the estimated size of the entire mobile Internet in 2012.
According to Cisco, these increases are driven by:
More Streamed Content: On-demand or streamed content for viewers versus simply downloaded content, will grow mobile cloud traffic 28-fold from 2011 to 2016, a CAGR of 95 percent.
More Connections: In 2016 more than 10 billion mobile Internet-connected devices — in a world population projected at 7.3 billion — will be in use, including machine-to-machine (M2M) modules. (One M2M application to update digital billboards through wireless. Advertisers will put up different ads for time of day or day-of-week and produce ads able to quickly reflect global shifts, such as gas prices).
Enhanced Computing of Devices: More powerful mobile devices will consume and generate more data traffic. Tablets, for example, will generate traffic that will grow 62-fold from 2011 to 2016. Mobile data traffic generated by tablets in 2016 (1 exabyte per month) will be four times the total amount of monthly global mobile data traffic in 2010 (237 petabytes per month).
Faster Speeds: A prime enabler for mobile data traffic growth is mobile network connection speed growth. Faster means more consumption, and Cisco projects mobile speeds (including 2G, 3G and 4G networks) to jump nine-fold from 2011 to 2016.
More Video: Mobile userslikely want more mobile video, which will comprise 71 percent of all mobile data traffic by 2016.
The Cisco study also projects that 71 percent of all smartphones and tablets (1.6 billion) could be capable of connecting to an Internet Protocol version 6 (IPv6) mobile network by 2016. From a broader perspective, 39 percent of all global mobile devices (more than 4 billion), could be IPv6-capable by 2016.
More wireless devices and nodes accessing mobile networks worldwide will be the primary contributor to traffic growth. By 2016, more than 8 billion handheld or personal mobile-ready devices and nearly 2 billion machine-to-machine connections will be out there, including GPS in cars, asset tracking systems in shipping and manufacturing sectors and medical applications for making patient records more readily available.
Smartphones, laptops and other portable devices will drive about 90 percent of global mobile data traffic by 2016.
M2M traffic will represent 5 percent of 2016 global mobile data traffic while residential broadband mobile gateways will account for the remaining 5 percent of global mobile data traffic.
To address the rise in demand for the mobile Internet, service providers are increasingly looking to offload traffic to fixed/Wi-Fi networks.
In 2011, 11 percent, or 72 petabytes, per month of total mobile data traffic was offloaded. By 2016, 22 percent, or 3.1 exabytes, per month of total mobile data traffic will be offloaded.
The average mobile connection speed doubled in 2012 and Cisco expects that to increase nine-fold by 2016. Mobile connection speeds are a key factor in supporting and accommodating mobile data traffic growth.
Q:Big Data is a disruptive phenomenon that has emerged in recent years. What are the main challenges and opportunities when managing and analyzing Big Data?
A: Analytics provides insights using a wide sweep of mathematical and Big Data processing and analysis technologies. To benefit from it means becoming familiar enough with the IT ecosystem to choose the right mix of analytics techniques to achieve a desired result.
Industry watchers report that innovative early adopted IT businesses are trending to shared infrastructure. This means that these organizations are moving from information silos towards shared infrastructures to virtualized environments and finally to the cloud to become more agile in the market place and to lower costs.
To this end, Cisco is leading the way in the server market with its Big Data focused Unified Computing System (UCS). This next generation data center platform focused on data intensive applications optimized for Big Data, including focusing on data analytics and the capacity to create heterogeneous ecosystems globally in complex Web service environments. UCS unites compute, network, flash storage, virtualization, and private cloud into a united system designed to reduce total cost of ownership (TCO) and increase business agility.
The UCS features accelerating the movement of data from storage to the compute domain using a solid state application that provides the ability to quickly add assets and rapidly provision and re-provision resources to simplify IT management. Using electricity driven flash memory storage consumes only 20 percent of the power of traditional mechanical hard drives while it reads more than 100 times faster.
This is a boon to data managers seeking ways to counter the energy drain of hard drives and create green data center standards. Organizations with active input-output applications, such as credit cards processing, are reporting significantly increased efficiencies in latency and operations while lowering costs with flash storage.
The benefit of UCS in reducing latency in processing all data updates leads to higher revenues, increased customer satisfaction and raised an organization’s competitive advantage. Such improvements are real. Google is reported to have suffered a 20 percent drop in revenue with the increased time to display search results by as little a 500 milliseconds. Amazon reported a 1 percent sales drop for an additional delay of as little as 100 milliseconds.
The aim of UCS flash array is to help customers handle data intensive workloads, and specifically to deal with the problem of bringing flash storage to support real time analytics.
Q: How will Big Data technology evolve in future?
A: No one can predict where the information age will lead us. Even Padmasree Warrior, Cisco’s chief technology and strategy officer, who must keep an eye on the near future at least, said last year, “Who knows what we’ll see in the next ten years?”
“No one can reliably predict whether technology will ultimately enhance or impair the human dimension of senior leadership but one thing is sure-technology is here to stay. Either we master technology or technology will master us,” says a report from the U.S. National Defense University Information Age and Strategic Decision Making.
Warrior sees a lot more work being done in the future to provide individuals’ with a choice between opting in or opting out on data issues. This will require very sophisticated analytics, she said in 2013, or a different way to present data. “Until now, we’ve mostly been creatively centered around the user experience, just the experience of how information is moving, not how it’s being presented back to you,” adds Warrior. “So I think there’s going to be lots of shifts in the way we deal with technology in the next three to five years.”
Looking to the future, Derrick Harris, of Gigaom, writes that artificial intelligence is finally here.
“We have the computers, we have the data, and we have the algorithms: so we now have the artificial intelligence. No, it’s not yet the fear-mongering stuff of science-fiction or the human-replacing stuff of Her, but AI is finally for real. Thanks to advances in machine learning, we have smartphones that can recognize voice commands, media services that predict what movies we’ll like, software that can identify the relationships among billions of data points, and apps that can predict when we’re most fertile.
“Looking forward, the work being done in areas like deep learning will make our AI systems more useful and more powerful. Set loose on complex datasets, these models are able to extract and identify the features of what they’re analyzing at a level that can’t be programmed. Without human supervision, deep learning projects have figured out what certain objects look like, mapped across word usage across languages and even learned the rules of video games. All of a sudden, it looks possible to automate certain tasks such as tagging content to make it searchable, or predicting with high accuracy what someone’s words means or what they’ll type next.”
Q: What are the benefits of Big Data technologies?
A: Building a robust data and agile transactional system is a basic to benefiting from the advantages of Big Data. Here’s some reasons for doing so:
Dialoguing with consumers: Social networking and search engines have made today’s customers better informed and more opinionated than those of the past, even of the more recent past like your mother’s generation. Big Data with its internal and external data sources can give you a full 360 degree view of these customers: what they’re interested in; what their buying triggers are; how they prefer to shop; why they shop around; why they switch; what they’ll buy next; and what motivates them to recommend a company to others. It’s almost possible to engage in one-on-one conversations in real time with them to effectively draw insights from consumer habits and to generate greater impact from marketing campaigns.
Perform risk analysis: Predictive analytics on economic data, fueled by Big Data can help financial managers better understand systemic risk in the financial sector. Furthermore, scanning and analyzing breaking new events or social media feeds make it possible to keep up to speed on the latest developments in retail business, industrial, health, agricultural and financial environments. Detailed health-tests on suppliers and customers are also possible with Big Data allowing quick action if one of them is in risk of defaulting.
Detecting, reducing and remediating financial fraud: Not a day goes by without a thief attempting to defraud a company by an ever evolving cycle of schemes and strategies, even if it merely means stealing and cloning credit cards. Analytics and Big Data systems are sifting immense data volumes to expose secretive patterns, events and trends that are a sign of criminal fraud. Flagging a transaction for a book purchase in Richmond, Virginia at the same time as someone uses the card to pay for gas in Munich, Germany and putting a hold on the card to prevent unauthorized activity is an example of a fast database response in a virtually instantaneous fraud detection scheme. It’s a matter of staying a step ahead of criminals. Big Data analytics can flag any situation where 16 digit numbers – potentially credit card data – are used and lead to an investigation.
Q. As the promise of better and faster data-driven decision making with Big Data and Analytics (BDA) appears on executive radars, what are the compelling and promising advances for storage and computing architecture?
A: According to IDC, organizations must concentrate heavily on data center solutions that allow them to deliver scalable, reliable, and flexible infrastructure for BDA environments. They should aggressively add integrated systems to lift business agility and lower the cost of doing business. Additionally, they must make information as easily available as possible, and as they mine it make certain that it is as tightly secure as technology allows.
Analysts agree that Cisco has responded quickly to this call. IDC says, for example, that solutions like Cisco’s Common Platform Architecture based on UCS with Intel Xeon processors provide a predictable, flexible, and open infrastructure on which companies can build a broad portfolio of BDA solutions while minimizing capital and operating costs.
The Cisco Unified Computing System™ (Cisco UCS®) Invicta C3124SA Appliance accelerates data-intensive workloads. Customers using it can aggressively converge performance and efficiency as needed to make decisions in real time to drive their businesses. When a business expands, customers can transition from the Cisco UCS Invicta C3124SA to the Cisco UCS Invicta Scaling System, the first truly enterprise-class, scalable solid-state system.
This technology is known to outperform similar systems. That’s due to being explicitly designed for NAND flash memory to sustain high throughput, a high I/O operations per second (IOPS), and ultra-low latency while overcoming the write performance and longevity challenges typically associated with multilevel-cell (MLC) flash memory.
Customers using it will experience faster mission-critical application workloads. Due to Cisco’s application-centric approach, customers will quickly and efficiently configure IT infrastructure to support resource intensive applications, real-time analytics, and decision making. Who wouldn’t want business transactions, analysis, batch jobs, desktops, and other important processes consistently performed at maximum speed?
Here’s how Cisco’s solid-state systems’ products improve performance workloads:
● Analytics and intelligence: Extract, integrate, and analyze data up to 10 times faster.
● Batch processing: Run batches without interrupting other workflow.
● E-mail: Reduce time delays by a factor of up to 50.
● Online transaction processing: Remove performance bottlenecks between servers and memory.
● Video: Complete more transcoding tasks in significantly less time.
● Virtual desktops: Improve overall user experience with desktops that launch faster and respond quickly while virus scanning.
● Database loads: Dramatically reduce query response times.
● High-performance computing (HPC): Leverage low latency IO requests to speed time sensitive applications.
As business demands and technology requirements rise, customers say they need to process and react to data faster. Applications must perform higher, and IT departments need infrastructure that quickly adapts and re-adapts when demand is there. Here’s where Cisco shines.
Q: Organizations will increasingly ask how to overcome challenges for extending Big Data and Analytics (BDA). What will they need to help them navigate standards, facilities design and staff retraining?
A: It’s early days in the evolving incorporation of BDA into most organizations’ data management and business processes. It is likely to be close to end of this decade before it is completes. A priority now is to reconsider how the data center infrastructure is designed, deployed, and managed. Many organizations now are assessing how to make greater use of converged infrastructure systems like Cisco’s UCS to improve agility, availability, and operating efficiency.
Cisco, as a leading provider of core elements in current data centers, will continue to expand its role in helping companies make the transition as painless and flexible as possible. Cisco will stay focused on rolling out some key technical enhancements to existing product lines as UCS, created in 2009, to achieve this.
On top of this the IT ecosystem has a responsibility to educate customers about the broad set pf professional services they offer to help them navigate this evolutionary change.
Many analysts see flash-memory technology in the form of solid-state drives (SSDs) as an effective and up-and-comer to help accelerate both storage and computing architecture. Unfortunately, some SSDs deployed in within traditional architectures limits their effectiveness across the data center as an entity. Some researchers think it’s time to move past adding SSDs to hard-disk-drive (HDD)–based storage systems, or even to individual computing platforms, and instead to deploy flash-memory technology as a fundamental component of a unified data center fabric.
To bridge the gap between the rising application performance required of servers and the performance of traditional storage systems it requires solid-state memory systems being placed closer to the application. With the Cisco’s recently launched UCS Invicta Series, the company has enhanced the successful Cisco Unified Computing System™ (Cisco UCS®) platform by moving solid-state memory into the computing domain. In addition to addressing intensive workloads like virtual desktop infrastructure (VDI), databases, and analytics, solid-state memory promises to extract even faster answers from BDA deployments—all the trime saving money and valuable data center space, power, and cooling resources. With Cisco’s unified systems-based approach, application performance has moved to a new, higher level, increasing efficiency and return on investment (ROI) and transforming business operations.