“Container and orchestration technologies have made a quantum leap in manageability for microservice architectures. Kubernetes is the clear winner in this space. It’s taken a little longer, but recently Kubernetes has turned a corner in its maturity and readiness to handle stateful workloads, so you’re going to see 2020 be the year of Kubernetes adoption in the database space in particular. “— Jonathan Ellis.
I have interviewed Jonathan Ellis, Co-Founder and CTO at DataStax. We talked about Kubernetes, Hybrid and Multi-cloud. In addition, Jonathan tells us his 2020 predictions and thoughts around migrating from relational to NoSQL.
Happy and Healthy New Year! RVZ
Q1. Hybrid cloud vs. multi-cloud: What’s the difference?
Jonathan Ellis: Both hybrid and multi-cloud involve spreading your data across more than one kind of infrastructure. As most people use the terms, the difference is that hybrid cloud involves a mix of public cloud services and self-managed data center resources, while multi-cloud involves using multiple public cloud services together, like Amazon Web Services (AWS), Microsoft Azure, and Google Cloud Platform (GCP).
Importantly, multi-cloud is more than using multiple regions within one cloud provider’s infrastructure. Multiple regions can provide resiliency and distribution of your data (although outages with a large enough blast radius can still affect multiple regions, like Azure’s global DNS outage earlier this year), but you’re still limited to the features of a single provider rather than a true multi-cloud environment.
Q2. What is your advice: When is it better to use on-prem, or hybrid, or multi-cloud?
Jonathan Ellis: There are three main areas to consider when evaluating the infrastructure options for an application. The best approach will depend on what you want to optimize for.
The first thing to consider is agility—cloud services offer significant advantages on how quickly you can spin infrastructure up and down, allowing you to concentrate on creating value on the software and data side. But the flip side of this agility is our second factor, which is cost. The agility and convenience of cloud infrastructure comes with a price premium that you pay over time, particularly for “higher level” services than raw compute and storage.
The third factor is control. If you want full control over the hardware or network or security environment that your data lives in, then you will probably want to manage that on-premises.
A hybrid cloud strategy can let you take advantage of the agility of the cloud where speed is the most important factor, while optimizing for cost or for control where those are more critical. This approach is popular for DataStax customers in the financial services sector, for instance. They like the flexibility of cloud, but they also want to retain control over their on-premises data center environment. We have partnered with VMware on delivering the best experience for public/private cloud deployments here.
DataStax builds on Apache Cassandra™ technology to provide fine-grained control over data distribution in hybrid cloud deployments. DataStax Enterprise (DSE) adds performance, security and operational management tools to help enterprises improve time-to-market and TCO.
Q3. IT departments are facing an uphill battle of managing hybrid, multi-cloud environments. Why does building scalable modern applications in the cloud remain a challenge?
Jonathan Ellis: Customers of modern, cloud-native applications expect quick response times and 100% availability, no matter where you are in the world. This means your data layer needs the ability to scale both in a single location and across datacenters. Relational databases and other systems built on master/slave architectures can’t deliver this combination of features. That’s what Cassandra was created for.
Cloud vendors have started trying to tackle these market requirements, but by definition their products are single-cloud only. DSE not only provides a data layer that can run anywhere, but it can actually run on a single cluster that spans machines on-premises and in the cloud, or across multiple public clouds.
Q4. Securing a multi-cloud strategy can be difficult due to a lack of visibility across hosts. What is your take on this?
Jonathan Ellis: Security for a multi-cloud architecture is more complex than security for a single cloud and has unique challenges. Security is required at multiple levels in the cloud and often involves compliance with regulatory standards. While security vendors are trying to solve this problem across clouds, the current tooling is limited and the feature sets vary so the ability to have a cohesive view of the underlying IaaS across clouds is not optimal. This implies a need for IT teams to have skill sets for each cloud in their architecture, while relying on the AWS, GCP or Azure specific security, monitoring, alerting and analytics services to provide visibility. (As applications and databases move to managed kubernetes platforms like GKE, EKS and AKS, some of the security burden for host level security shifts to the cloud providers who manage and secure these instances at different levels.)
These challenges are not stopping companies from moving forward with a multi-cloud strategy, driven by the advantages of avoiding vendor lock in and improved efficiency from a common data layer across their infrastructure, as well as by non-technical factors such as acquisitions.
Datastax provides capabilities that enable companies to improve their security posture and help with the security challenges. At the data security level, DSE advanced security allows companies to minimize risk, achieve granular access control, and help with regulatory compliance. It does this with functionality like unified authentication, end-to-end encryption, and enhanced data auditing. We are also developing a next generation cloud based monitoring tool that will have a unified view across all of your Cassandra deployments in the cloud and will be able to provide visibility into the underlying instances running the cluster. Finally, Datastax managed services offerings like Apollo (see below) will also provide some relief to this problem.
Q5. You recently announced early access to the DataStax Change Data Capture (CDC) Connector for Apache Kafka®. What are the benefits of bridging Apache Kafka with Apache Cassandra?
Jonathan Ellis: Event streaming is a great approach for applications where you want to take actions in realtime. Apache Kafka was developed by the technology team at LinkedIn to manage streaming data and events for these scenarios.
Cassandra is the perfect fit for event streaming data because it was built for the same high ingest rates that are common for streaming platforms such as Kafka. DataStax makes it easier to bring these two technologies together so that you can do all of your real-time streaming operations in Kafka and then serve your application APIs with a highly available, globally distributed database. This defines a future proof architecture that handles any needs that microservices and associated applications throw at it.
It’s important to recognise what Kafka does really well in streaming, and what Cassandra does well in data management. Bringing these two projects together allows you to do things that you can’t do with either by itself.
Q6. DataStax recently announced a production partnership with VMware in support of their VMware vSAN to include hybrid and multi-cloud configurations. Can you please elaborate on this?
Jonathan Ellis: We have worked with VMware for years on how to support hybrid cloud environments, and this partnership is the result. VMware and DataStax have a lot of customers in common, and for a lot of those customers, the smoothest path to cloud is to use VMware to provide a common substrate across their on-premises and cloud deployments. Partnering with VMware allows DataStax to provide improved performance and operational experience for these enterprises.
Q7. What are your 2020 predictions and thoughts around migrating from relational to NoSQL?
Jonathan Ellis: Container and orchestration technologies have made a quantum leap in manageability for microservice architectures. Kubernetes is the clear winner in this space. It’s taken a little longer, but recently Kubernetes has turned a corner in its maturity and readiness to handle stateful workloads, so you’re going to see 2020 be the year of Kubernetes adoption in the database space in particular. (Kubernetes support for DSE is available on our Labs site.)
In terms of moving from relational to NoSQL, there’s still a gap that exists in terms of awareness and understanding around how best to build and run applications that can really take advantage of what Cassandra can offer. Our work in DataStax Academy for Cassandra training will continue in 2020, educating people on how to best make use of Cassandra and get started with their newest applications. This investment in education and skills development is essential to helping the Cassandra community develop, alongside the drivers and other contributions we make on the code side.
Q8. What is the road ahead for Apache Cassandra?
Jonathan Ellis: I was speaking to the director of applications at a French bank recently, and he said that while he thought the skill level for developers had gone up massively overall, he also thought that skills specifically around databases and data design have remained fairly static, if not down over time. To address this skills gap, and to take advantage of cloud-based agility, we’ve created the Apollo database (now in open beta) as a cloud-native service based on Cassandra. This makes the operational complexities of managing a distributed system a complete non-problem.
Our goal is to continue supporting Cassandra as the leading platform for delivering modern applications across hybrid and multi-cloud environments. For companies that want to run at scale, it’s the only choice that can deliver availability and performance together in the cloud.
Jonathan is a co-founder of DataStax. Before DataStax, Jonathan was Project Chair of Apache Cassandra for six years, where he built the Cassandra project and community into an open-source success. Previously, Jonathan built an object storage system based on Reed-Solomon encoding for data backup provider Mozy that scaled to petabytes of data and gigabits per second throughput.
– DataStax Enterprise (DSE)
– Apollo database
– The Global AI Index 2019, ODBMS.org DEC. 17, 2019
– Look ahead to 2020 in-memory computing, ODBMS.org DEC. 27, 2019
Follow us on Twitter: @odbmsorg
Follow us on: LinkedIn
“We find ourselves in a new era of patient-driven innovation, which drives far better design and fosters collaboration between stakeholders.” — Amy Tenderich.
I have interviewed Amy Tenderich, journalist / blogger, well known patient advocate, and founder and editor of DiabetesMine.
Q1. You are one of the leading advocates for the diabetic community. In 2007, you wrote an open letter to Steve Jobs that went viral, asking Apple to apply the same design skills to medical devices that Apple devoted to its consumer products. What happened since then?
Amy Tenderich: There has been a true Revolution in Diabetes Technology and the “consumerization” of medical devices in general… and I’m thrilled to be part of it! As I laid out in my “10 Years Later” post, the biggest milestones are:
- Upsurge of patient involvement in innovation/design
- Shift to data-driven disease care that increasingly prioritizes Interoperability of devices and data
- US FDA forging a path for open, candid interaction between the regulatory agency and the patient community – which we had a hand in (very exciting!)
- Consumer giants like Apple, Google, Microsoft, Samsung and others getting involved in healthcare, and diabetes specifically — which changes the landscape and mindset for products and services
Q2. At that time you wrote that the devices the diabetic community had to live with were “stuck in a bygone era”, created in an “engineering-driven, physician-centered bubble.” How is the situation now?
Amy Tenderich: With the help of our prodding, medical products are now designed to be more compact, more comfortable, more aesthetic and more personalizable than ever before. In other words, they’re now keeping pace with consumer tech products.
For examples, see the Tandem t:slim insulin pump and the One Drop glucose meter – which both resemble Apple products – the Quell pain relief solution, and the dynamic, fun-to-use MySugr diabetes data app.
Q3. Why is it so hard to bring the tech and pharma worlds together?
Amy Tenderich: Good question! Check out the 2012 Altantic article titled, “The Reason Silicon Valley Hasn’t Built a Good Health App.” It basically outlines how technology companies tend to focus on the tech itself, without understanding the real-world use case.
Also, tech companies tend to develop and iterate at breakneck speed, whereas the healthcare world – especially big legacy pharma companies – are burdened by loads of regulations and historically moved at a glacial pace.
The good thing is, these two worlds are inching closer together as:
- Pharma companies are by necessity transforming themselves into digital organizations that deal in software and innovate more rapidly, and
- Tech companies are “getting religion” on understanding the real-world aspects of people’s health and disease care.
Q4. Who are the key diabetes “stakeholders”?
Amy Tenderich: Patients and caregivers, first and foremost, as the people literally “living this illness.” Then of course: Pharma and Medtech companies, FDA regulators, clinicians, researchers, other healthcare providers (eg. Certified Diabetes Educators), non-profit advocacy groups, health data platform and app developers, and healthcare designers.
Q5. Artificial Intelligence and Machine Learning (ML) are becoming widely discussed and employed in the diabetes tech world. What is your take on this?
Amy Tenderich: Indeed, AI/ML appear to be the wave of the future. All data-driven tools for diabetes care – including new Artificial Pancreas tech on the horizon – is based on these advanced computing techniques.
Q6. When using AI for diabetes: what are the main new regulatory and ethical issues that need to be faced?
His slide on “Seven Threats to AI” laid out the following:
AI/ML STARTUPS AND PRACTITIONERS :
- Over-focusing on “shiny objects” vs. the UX and business value.
- Smart algorithms are being trained on dumb and dirty data.
- Practitioners are building “black boxes” even they can’t understand.
ENTERPRISE LEADERS :
- Though they’re the key customers, most enterprise organizations don’t know where to begin.
- Major incumbents possess—but fail to capitalize on—the most valuable commodity: Data.
INVESTORS : Hype allows some companies to masquerade as “AI” companies.
REGULATORS : Regulation of AI/ML still needs to come into focus.
Evans and Rock Health have actually been instrumental in helping the US FDA decide how to approach regulation of AI and Machine Learning in Healthcare. Their work focuses on gaining consensus around “ground truth data.” You can read all about it and even weigh in here.
Q7. Which do you care more about: Accelerating medical advances or protecting data rights?
Amy Tenderich: The hope is that these are not mutually exclusive. But if you ask people in the Diabetes Community, I believe they would almost always prioritize accelerating medical advances.
That’s because type 1 diabetes is a potentially deadly disease that requires 24/7 effort just to stay out of the hospital. Data privacy seems a small trade-off for many people to get better tools that aid in our survival and reduce the disease burden.
Q8. Many in the Diabetes Community are turning to DIY tech to create their own data-sharing tools and so-called Automated Insulin Delivery (or “closed loop”) systems. Can you please explain what this means? Is it legal?
Amy Tenderich: I’m proud to say that we at DiabetesMine helped launch the #WeAreNotWaiting community rallying around this DIY tech.
That is, the now-hashtag “We Are Not Waiting” was the result of a group discussion at the very first DiabetesMine D-Data ExChange technology forum in November 2013 at Stanford University. We gathered some of the early tech-savvy patient pioneers who were beginning to reverse-engineer existing products and develop their own platforms, apps and cloud-based solutions to help people with diabetes better utilize devices and health data for improved outcomes.
Today, there is a vibrant community of thousands of patients using (and iterating on) their own homemade “closed loop systems” around the world. These systems connect a continuous glucose monitor (CGM) with an insulin pump via a sophisticated algorithm that essentially automates insulin dosing. Current systems do still require some user intervention (so the loop is not completely “closed”), but they greatly improve overall glucose control and quality of life for patients.
These DIY systems have not been approved by FDA for safety and effectiveness, but they are by no means illegal. In fact, the results have been so powerful that no less than 6 companies are seeking FDA approval for commercial systems with the same functionality. And one popular DIY model called Loop has been taken up by an outfit called Tidepool for conversion into a commercial, FDA-scrutinized product.
Q9 Is it possible to use Social Media for real Health Impact?
Amy Tenderich: Most certainly, yes. There is a growing body of evidence showing real-world impact on improved health outcomes. See for example this recent eVariant article that cites the benefits of patient-powered research networks, and states, “There’s no question that patients use the Internet to take control of their own health.”
See also, original research from our DiabetesMine team, published in the Journal of Diabetes Science and Technology (Nov 2018): “Findings indicate that social media provides a significant source not only of moral support and camaraderie, but also critical education on thriving with diabetes. Importantly, we observed strong evidence of peer influence on patients’ therapy and diabetes technology purchasing decisions.”
Q10 What is the FDA mHealth Pre-certification Program? and what it Means for Diabetes?
Amy Tenderich: This is the FDA’s revolutionary move to change how it reviews mobile apps and digital health software to accelerate the regulatory process and get these products out there for people to start using ASAP.
The agency announced its Pre-Certification for Software Pilot Program in July 2017. Its role is to evaluate and dub certain companies as “trustworthy,” to fast track their regulatory review process.
For the pilot, the FDA chose 9 companies out of more than 100 applicants, and notably for our Diabetes Community: seven of the nine companies have direct ties to diabetes!
See our coverage here for more details.
Qx Anything else you wish to add?
Amy Tenderich: We find ourselves in a new era of patient-driven innovation, which drives far better design and fosters collaboration between stakeholders. There are so many exciting examples of this – in telemedicine, at the Mayo Clinic, and at Novo Nordisk, to name just a few.
Amy is the Founder and Editor of DiabetesMine.com, a leading online information destination that she launched after her diagnosis with type 1 diabetes in 2003. The site is now part of San Francisco-based Healthline Media, where Amy also serves as Editorial Director, Diabetes & Patient Advocacy.
Amy is a journalist / blogger and nationally known patient advocate who hosts her own series of thought leadership events (the annual DiabetesMine Innovation Summit and biannual DiabetesMine D-Data ExChange) that bring patient entrepreneurs together with the medical establishment to accelerate change.
She is an active advisor to the American Association of Diabetes Educators (AADE) and medtech consultant, along with a frequent speaker at policy and digital health events.
As a pioneer in the Diabetes Online Community (DOC), Amy has conducted numerous patient community research projects, and authored articles for Diabetes Spectrum, the American Journal of Managed Care and the Journal of Diabetes Science and Technology.
Amy is also the proud mom of three amazing young adult daughters. In her “free time,” she enjoys hiking, biking, leisure travel, good wine and food, and just about anything relaxing done under the California sun.
– On gaining Knowledge of Diabetes using Graphs. Interview with Alexander Jarasch, ODBMS Industry Watch, February 4, 2019.
– On using AI and Data Analytics in Pharmaceutical Research. Interview with Bryn Roberts ODBMS Industry Watch, September 10, 2018
Follow us on Twitter: @odbmsorg
“I think Redis is entering a new stage where there are a number of persons that now actively daily contribute to the open source. It’s not just “mostly myself”, and that’s great.” –Salvatore Sanfilippo
I have interviewed Salvatore Sanfilippo, the original developer of Redis. Redis is an open source in-memory database that persists on disk.
Q1.What is new in the Redis 6 release?
Salvatore Sanfilippo: The main new features are ACLs, SSL, I/O threading, the new protocol called RESP3, assisted client side caching support, a ton of new modules capabilities, new cluster tools, diskless replication, and other things, a very long list indeed.
Q2.Can you tell us a bit more about the new version of the Redis protocol (RESP3), what is it? and why is it important?
Salvatore Sanfilippo: It’s just an incremental improvement over RESP2. The main goal is to make it more semantical. RESP2 is only able to represent arrays from the point of view of aggregated data types. Instead with RESP3 we have sets, hashes, and so forth. This makes simpler for client libraries to understand how to report the command reply back to the client, without having a conversion table from the array to the library language target type.
Q3.You have recently implemented a client side caching for Redis 6. What are the main benefits of this?
Salvatore Sanfilippo: Most big shops using Redis end having some way to memorize some information directly into the client. Imagine a social network that caches things in Redis, where the same post is displayed so many times because it is about a very famous person. To fetch it every time from Redis is a lot of useless efforts and cache traffic. So many inevitably end creating protocols to retain very popular items directly in the memory of the front-end systems, inside the application memory space. To do that you need to handle the invalidation of the cached keys. Redis new client side caching is a server side “help” in order to accomplish this goal. It is able to track what keys a given client memorized, and inform it when such keys gets modified, so that the client can invalidate them.
Q4. Are there any drawbacks as well?
Salvatore Sanfilippo: Sure, more caching layers, more invalidation, more complexity. Also more memory used by Redis to track the client keys.
Q5. “Streams” data structure were introduced in Redis 5. What is it? How does it differ from other open source streaming framework such as Apache Pulsar or Kafka ?
Salvatore Sanfilippo: A Redis stream is basically a “log” of items, where each item is a small dictionary composed of keys and values. On top of that simple data structure, which is very very memory efficient, we do other things that are more messaging and less data structure: consume a stream via a consumer group, block for new messages, and so forth.
There are use cases that can be solved with both Redis Streams and Pulsar or Kafka, but I’m against products comparisons, it’s up the users to understand what they need.
Q6.What are you working at present?
Salvatore Sanfilippo: At finalizing the Redis 6 release adding many new module APIs, and also porting the Disque project (https://github.com/antirez/disque) as a Redis module.
Q7. What is your vision ahead for Redis?
Salvatore Sanfilippo: I think Redis is entering a new stage where there are a number of persons that now actively daily contribute to the open source. It’s not just “mostly myself”, and that’s great.
Redis modules are playing an interesting role, we see Redis Labs creating modules, but also from the bug reports in the Github repository, I think that there are people that are writing modules to specialize Redis for their own uses, which is great.
LEAD, OPEN SOURCE REDIS DEVELOPMENT, Redis Labs.
Salvatore started his career in 1997 as a security researcher, writing the hping (https://en.wikipedia.org/wiki/Hping) security tool and inventing the Idle Scan (https://en.wikipedia.org/wiki/Idle_scan). Later he worked on embedded systems, focusing on programming languages research and creating a small footprint Tcl interpreter, which is still in active development. With a colleague, Salvatore created the first two Italian social applications in partnership with Telecom Italia. After this experience, he decided to explore new ways to improve web app development, and ended up writing the first alpha version of Redis and open sourcing it. Since 2009, he has dedicated most of his time to developing Redis open source code.
Over the years, Salvatore has also created a number of other open source projects ranging from software defined radio, to line editing tools, to children development environments. He lives in Sicily, Italy.
Redis is an open source (BSD licensed), in-memory data structure store, used as a database, cache and message broker.
follow us on Twitter: @odbmsorg
“Corporations can have the best technology, the best digital infrastructure, but if they cannot excite people to work with it and to see it not only as a tool, but a vehicle for innovation and development that can massively empower the greater vision of the company they are part of, technology will only reach half its potential.” –Rahmyn Kress
I have interviewed Rahmyn Kress, Chairman of the Digital Executive Committee at Henkel, and founder of Henkel X, an open-innovation platform accelerating Henkel’s entrepreneurial transformation.
Q1. We are seeing a new wave of disruption through digital technology. What are the main challenges and opportunities?
Rahmyn Kress: I personally think the biggest challenge of digital disruption is not finding and implementing new technologies, but rather familiarizing employees with them so they will accept the change.
Corporations can have the best technology, the best digital infrastructure, but if they cannot excite people to work with it and to see it not only as a tool, but a vehicle for innovation and development that can massively empower the greater vision of the company they are part of, technology will only reach half its potential.
That is why it is so important to link the topic of digitization with something positive and make it come alive through dialogue and interaction.
We, at Henkel X, are doing this through various activities: Our CDO+1 lunch takes place every two weeks and gives employees the opportunity to ask questions about recent trends, disruptive technologies and Henkel X projects. We also introduced our Henkel X App, which is filled with curated content and offers an opportunity to chat with coworkers from around the world.
Furthermore, we launched our Digital BaseFit initiative to provide employees with the basic digital knowledge they need to know today. And there is also the opportunity to attend our Show & Tell events where startups pitch their ideas to Henkel – a total of 12,000 participants from various backgrounds in Henkel have dialled in or attended the events in person. All these initiatives make it much easier for us to address new technologies and issues.
Q2. You have founded ” Henkel X”. Can you please explain how it works?
Rahmyn Kress: When Marius (Swart) and I founded Henkel X in February 2018, we designed it not only to accelerate Henkel’s entrepreneurial transformation, but to provide an open innovation platform that could act as a catalyst of industrial change, drive innovations and create disruptive business models for the whole industry. In order to do that, we established and operate several impact driven programs for its members based on three pillars: The biggest value lies in our Ecosystem integrating a strong, diverse network of partners and experts sharing knowledge, views and ideas. On top of that we create Experiences, that means we organize and host events to foster collaboration and innovation, and finally we facilitate Experimentation to boost new ways of working and building minimum viable products (MVPs) fast, in an agile environment.
Q3. How do you create a curious fail fast culture in the enterprise?
Rahmyn Kress: Through the establishment of Henkel X we are trying to introduce a culture that enables employees to generate ideas, act quickly and thus achieve rapid success – or fail fast. We really try to create a vibrant field for experimentation and encourage our business units not to shy away from contact and cooperation with startups. This approach carries the risk of failure, but other paths can quickly be taken so that the teams can implement their projects successfully in the shortest possible time. As Steve Jobs once said: “Sometimes when you innovate, you make mistakes. It is best to admit them quickly, and get on with improving your other innovations.” Speed and the willingness to fail are key in order to drive digital innovation and stay competitive in the market.
Q4. Isn’t this culture dependent?
Rahmyn Kress: Yes, it totally is. And it is one of the most difficult points in the digital innovation process. In order for corporates to adapt to the new technologies we definitely need to cultivate a stronger trial and error mentality. In digital innovation, for example, Germany lags five years behind the United States – even though, among all European countries it is Germany in particular which has a huge amount of potential: outstanding tech know-how, financially powerful brands and corporations, and a large number of globally leading research institutes focused on new technologies and digitization. We should make use of these resources – and with Henkel X that’s precisely what we’re doing.
Q5. What are the main lessons you have learned while working at Accenture and Universal Music?
Rahmyn Kress: As the EVP of Physical and Digital Transformation, Supply Chain and Operations at Universal Music Group I quickly built a wealth of experience in digital transformation, just as the first wave of disruption hit the media industry. It was an early wake-up call that taught me how to pivot and adapt in an industry entering the early stages of change. Initially, digital only accounted for a small percentage of the music industry’s total revenue, but it suddenly became clear that if digital continued to prevail then manufacturing and logistics would become a commodity. I am not suggesting for one second that this is true for consumer goods, but we have so many examples of rapid change that the signs of digital transformation must be taken very seriously. This mainly affects how we handle products and the way we orient ourselves towards services instead of products. I saw this during my time at Accenture as well, where I created and essentially optimized digital supply chains and helped large corporates in their efforts to pivot their business towards the digital transformation.
Q6. What are your current projects as Chairman of the Digital Executive Committee at Henkel?
Rahmyn Kress: I see myself as a catalyst who strengthens the entrepreneurial spirit and questions existing structures and processes: To make our internal digital upskilling program as effective as possible, for example, we created a digital glossary that ensures we speak a common language. Also, my team put together the digital technology stack to help us communicate with our audience that is using the Henkel brands and products. By having a common architecture throughout the organisation we can move faster when it comes to adaptation and enhancements going forward. Most importantly, we have the opportunity to capture data that we can later extract value from – be it in supply chain optimisation or understanding emerging customer and consumer trends.
But our efforts in rolling out the digital transformation don’t stop here: As Henkel X also operates as an open innovation platform we initiated Henkel X Partners, a network forum during which we bring local entrepreneurs, industry partners, VC’s, influential family businesses, startups, and thought leaders together. As collaborating partners they form part of our ecosystem which we intend to grow and strengthen across Europe. Last month, for example, we launched Henkel X Partners in Barcelona to extend the Henkel X activities into Spain and build this regional extension. In October we are going to do the same in Milan in close cooperation with Bocconi.
Q7. You have set up a priority to accelerate digitisation in your company. What are the main stumbling blocks, since you are not Google?
Rahmyn Kress: The biggest challenge does not lie in digitisation itself, but in how we use it to change the way we work and the way we do business, and in what new business areas and models we are developing. We have to think long-term and as a visionary. This means asking ourselves, for example, “Will there be any washing powders and adhesives as we know them today at all in the future,?”, “Will this still be our core business?”.
In order to find the right answers and move forward in the right direction, I think in three different dimensions, which can be seen as three interconnected horizons: Horizon 1 focuses on the constant optimisation of the core business through digital technology to fund the growth of the incremental innovation. Horizon 2 is about transforming and finding new business models. Perhaps in the future we will sell washing powder like coffee capsules?
Nowadays, we are still talking about our ‘traditional products’, which may be consumed completely differently in the future. And this brings us to Horizon 3 which is about actual disruptive innovations – the so called ‘moon shots’. Here, completely new business models are conceivable. The most important thing is to engage in all three horizons at the same time. Therefore each organisation needs to decide for itself, how much it wants to invest in each of them by taking into account the challenges, opportunities and threats in the marketplace, as well as the respective digital maturity.
Q8. You are a member of the World Economic Forum for the “Platform Economy“. What are the key insights you have gained out of this activity?
Rahmyn Kress: We are moving from a product focused to a much more platform focused world. Platform business models have higher barriers to entry, but once they’re established and operating, they are very difficult to compete against. Many organizations struggle with the rate of external innovation, they feel they can’t keep up. That is why they need to start thinking more about ways to collaborate together than how to compete with each other. Business as usual is a thing of the past: It is no longer big versus small, but rather slow versus fast – economic platforms are a promising answer to this ongoing shift.
Q9. Artificial Intelligence is on the rise. Is AI part of your strategy, and if yes, can you please explain what do you expect out of AI?
Rahmyn Kress: We see AI as an essential part of our strategy. Just recently, we entered into a partnership with Cognition X to make AI adoption more efficient and to drive transformation. Henkel X will use Cognition X, a fascinating AI news and advice platform, to engage with the Henkel X community through information, and to provide a network of expertise and knowledge around the deployment of artificial intelligence. Furthermore, we will start to roll out the Enterprise Edition of CognitionX’s AI Advice Platform, to access a Wiki of AI products. AI is great and we should make use of it!
Qx. Anything else you wish to add.
Rahmyn Kress: It is definitely time that we start to consider our industrial peers as business partners instead of competitors. Of course, there are areas of rivalry, especially in relation to products. But when it comes to innovation, we should work, think, and develop together. Here we can also learn from the music industry which demonstrates how important common platforms are. Digital transformation is a joint responsibility and our goal should be to enhance future growth, build reliable ecosystems across our value chains and drive digital innovation forward. What we need are places, digital or physical, to exchange and discuss ideas to hit our targets before others do – that is exactly what Henkel X aims to achieve.
Dr. Rahmyn Kress is Chairman of the Digital Executive Committee at Henkel and founder of Henkel X, an open-innovation platform accelerating Henkel’s entrepreneurial transformation.
Previously, he was President and CEO of DigiPlug, a tech company acquired by Accenture. Kress then joined ACCENTURE Ventures as their lead for Europe, Latin America and Africa.
Today, Kress is an active member in the venture capital and start-up community as mentor and angel investor and a member of several executive advisory boards, including the World Economic Forum Platform Economy advisory board.
Most recently, he founded thebeautifulminds.club an initiative that is uniting entrepreneurs, artists, business leaders, investors and strategists to create awareness and provide support around neurodiversity like ADHD and dyslexia to be recognized as unique skills in an entrepreneurial world.
On Digital Transformation, Big Data, Advanced Analytics, AI for the Financial Sector. Interview with Kerem Tomak, ODBMS Industry Watch, 2019-07-08
Follow us on Twitter: @odbmsorg
“Cameras are now everywhere. Large-scale video processing is a grand challenge representing an important frontier for analytics, what with videos from factory floors, traffic intersections, police vehicles, and retail shops. It’s the golden era for computer vision, AI, and machine learning – it’s a great time now to extract value from videos to impact science, society, and business!” — Ganesh Ananthanarayanan
I have interviewed Ganesh Ananthanarayanan. We talked about his projects at Microsoft Research.
Q1. What is your role at Microsoft Research?
Ganesh Ananthanarayanan: I am a Researcher at Microsoft Research. Microsoft Research is a research wing within Microsoft, and my role is to watch out for key technology trends and work on large scale networked-systems.
Q2. Your current research focus is to democratize video analytics. What is it?
Ganesh Ananthanarayanan: Cameras are now everywhere. Large-scale video processing is a grand challenge representing an important frontier for analytics, what with videos from factory floors, traffic intersections, police vehicles, and retail shops. It’s the golden era for computer vision, AI, and machine learning – it’s a great time now to extract value from videos to impact science, society, and business!
Project Rocket‘s goal is to democratize video analytics: build a system for real-time, low-cost, accurate analysis of live videos. This system will work across a geo-distributed hierarchy of intelligent edges and large clouds, with the ultimate goal of making it easy and affordable for anyone with a camera stream to benefit from video analytics. Easy in the sense that any non-expert in AI should be able to use video analytics and derive value. Affordable because the latest advances in CV are still very resource intensive and expensive to use.
Q3. What are the main technical challenges of large-scale video processing?
Ganesh Ananthanarayanan: In the hotly growing “Internet of Things” domain, cameras are the most challenging of “things” in terms of data volume, (vision) processing algorithms, response latencies, and security sensitivities. They dwarf other sensors in data sizes and analytics costs, and analyzing videos will be a key workload in the IoT space. Consequently, we believe that large-scale video analytics is a grand challenge for the research community representing an important and exciting frontier for big data systems.
Unlike text or numeric processing, videos require high bandwidth (e.g., up to 5 Mbps for HD streams), need fast CPUs and GPUs, richer query semantics, and tight security guarantees. Our goal is to build and deploy a highly efficient distributed video analytic system. This will entail new research on (1) building a scalable, reliable and secure systems framework for capturing and processing video data from geographically distributed cameras; (2) efficient computer vision algorithms for detecting objects, performing analytics and issuing alerts on streaming video; and (3) efficient monitoring and management of computational and storage resources over a hybrid cloud computing infrastructure by reducing data movement, balancing loads over multiple cloud instances, and enhancing data-level parallelism.
Q4. What are the requirements posed by video analytics queries for systems such as IoT and edge computing?
Ganesh Ananthanarayanan: Live video analytics pose the following stringent requirements:
1) Latency: Applications require processing the video at very low latency because the output of the analytics is used to interact with humans (such as in augmented reality scenarios) or to actuate some other system (such as intersection traffic lights).
2) Bandwidth: High-definition video requires large bandwidth (5Mbps or even 25Mbps for 4K video) and streaming large number of video feeds directly to the cloud might be infeasible. When cameras are connected wirelessly, such as inside a car, the available uplink bandwidth is very limited.
3) Provisioning: Using compute at the cameras allows for correspondingly lower provisioning (or usage) in the cloud. Also, uninteresting parts of the video can be filtered out, for example, using motion-detection techniques, thus dramatically reducing the bandwidth that needs to be provisioned.
Besides low latency and efficient bandwidth usage, another major consideration for continuous video analytics is the high compute cost of video processing. Because of the high data volumes, compute demands, and latency requirements, we believe that largescale video analytics may well represent the killer application for edge computing.
Q5. Can you explain how Rocket allows programmers to plug-in vision algorithms while scaling across a hierarchy of intelligent edges and the cloud?
Ganesh Ananthanarayanan: Rocket (http://aka.ms/rocket) is an extensible software stack for democratizing video analytics: making it easy and affordable for anyone with a camera stream to benefit from computer vision and machine learning algorithms. Rocket allows programmers to plug-in their favorite vision algorithms while scaling across a hierarchy of intelligent edges and the cloud.
The figure above shows our video analytics stack, Rocket, that supports multiple applications including traffic camera analytics for smart cities, retail store intelligence scenarios, and home assistants. The “queries” of these applications are converted into a pipeline of vision modules by the video pipeline optimizer to process live video streams. The video pipeline consists of multiple modules including the decoder, background subtractor, and deep neural network (DNN) models.
Rocket partitions the video pipeline across the edge and the cloud. For instance, it is preferable to run the heavier DNNs on the cloud where the resources are plentiful. Rocket’s edge-cloud partitioning ensures that: (i) the compute (CPU and GPU) on the edge device is not overloaded and only used for cheap filtering, and (ii) the data sent between the edge and the cloud does not overload the network link. Rocket also periodically checks the connectivity to the cloud and falls back to an “edge-only” mode when disconnected. This avoids any disruption to the video analytics but may produce outputs of lower accuracy due to relying only on lightweight models. Finally, Rocket piggybacks on the live video analytics to use its results as an index for after-the-fact interactive querying on stored videos.
More details can be found in our recent MobiSys 2019 work.
Q6. One of the verticals your project is focused on is video streams from cameras at traffic intersections. Can you please tell us more how this works in practice?
Ganesh Ananthanarayanan: As we embarked on this project, two key trends stood out: (i) cities were already equipped with a lot of cameras and had plans to deploy many more, and (ii) traffic related fatalities were among the top-10 causes of deaths worldwide, which is terrible! So, in partnership with my colleague (Franz Loewenherz) at the City of Bellevue, we asked the question: can we use traffic video analytics to improve traffic safety, traffic efficiency, and traffic planning? We understood that most jurisdictions have little to no data on the continuous trends on directional traffic volumes; accident near-misses; pedestrian, bike & multi-modal volumes, etc. Data on these is usually got by commissioning an agency to count vehicles once or twice a year for a day.
We have built technology that analyzes traffic camera feeds 24X7 at low cost to power a dashboard of directional traffic volumes. The dashboard raises alerts on traffic congestion & conflicts. Such a capability can be vital towards traffic planning (of lanes), traffic efficiency (light durations), and safety (unsafe intersections).
A key aspect is that we do our video analytics using existing cameras and consciously decided to shy away from installing our own cameras. Check out this project video on Video Analytics for Smart Cities.
Q7. What are the lessons learned so far from your on-going pilot in Bellevue (Washington) for active traffic monitoring of traffic intersections live 24X7? Does it really help preventing traffic-related accidents? Does the use of your technology help your partners with jurisdictions to identify traffic details that impact traffic planning and safety?
Ganesh Ananthanarayanan: Our traffic analytics dashboard runs 24X7 and accumulates data non-stop that the officials didn’t have access to before. It helps them understand instances of unexpectedly high traffic volumes in certain directions. It also generates alerts on traffic volumes to help dispatch personnel accordingly. We also used the technology for planning a bike corridor in Bellevue. The objective was to do a before/after study of the bike corridor to help understand the impact of the corridor on driver behavior. The City plans to use the results, to decide on bike corridor designs.
Our goal is to make the roads considerably safer & efficient with affordable video analytics. We expect that video analytics will be able to drive decisions of cities precisely in these directions towards how they manage their lights, lanes, and signs. We also believe that data regarding traffic volumes from a dense network of cameras will be able to power & augment routing applications for better navigation.
As the number of cities that start to deploy the solution increase, it will only increase the accuracy of the computer vision models with better training data, thus leading to a nice virtuous cycle.
Qx Anything else you wish to add?
Ganesh Ananthanarayanan: So far I’ve described our video analytics solution on how it uses video cameras to continuously analyze and get data. One thing I am particularly excited to make happen is to “complete the loop”. That is, take the output from the video analytics and in real-time actuate it on the ground to users. For instance, if we predict an unsafe interaction between a bicycle & car, send a notification to one or both of them. Pedestrian lights can be automatically activated and even extended for people with disabilities (e.g., in a wheelchair) to enable them to safely cross the road (see demo). I believe that the infrastructure will be sufficiently equipped for this kind of communication in a few years. Another example of this is warning approaching cars when they cannot spot pedestrians between parked cars on the road.
I am really excited about the prospect of the AI analytics interacting with the infrastructure and people on the ground and I believe we are well on track for it!
Ganesh Ananthanarayanan is a Researcher at Microsoft Research. His research interests are broadly in systems & networking, with recent focus on live video analytics, cloud computing & large scale data analytics systems, and Internet performance. His work on “Video Analytics for Vision Zero” on analyzing traffic camera feeds won the Institute of Transportation Engineers 2017 Achievement Award as well as the “Safer Cities, Safer People” US Department of Transportation Award. He has published over 30 papers in systems & networking conferences such as USENIX OSDI, ACM SIGCOMM and USENIX NSDI. He has collaborated with and shipped technology to Microsoft’s cloud and online products like the Azure Cloud, Cosmos (Microsoft’s big data system) and Skype. He is a member of the ACM Future of Computing Academy. Prior to joining Microsoft Research, he completed his Ph.D. at UC Berkeley in Dec 2013, where he was also a recipient of the UC Berkeley Regents Fellowship. For more details: http://aka.ms/ganesh
– Rocket (http://aka.ms/rocket)
– On Amundsen. Q&A with Li Gao tech lead at Lyft, ODBMS.org Expert Article, JUL 30, 2019
– On IoT, edge computing and Couchbase Mobile. Q&A with Priya Rajagopal, ODBMS.org Expert article, JUL 25, 2019
“The best possible database migration is when you are able to move all your data and stored procedures unchanged(!) to the new system.” — Michael Widenius.
I have interviewed Michael “Monty” Widenius, Chief Technology Officer at MariaDB Corporation.
Monty is the “spiritual father” of MariaDB, a renowned advocate for the open source software movement and one of the original developers of MySQL.
Q1. What is adaptive scaling and why is it important for a database?
Michael Widenius: Adaptive scaling is provided to automatically change behavior in order to use available resources as efficiently as possible, as demands grows or shrinks. For a database, it means the ability to dynamically configure resources, adding or deleting data nodes and processing nodes according to demand. This provides both scale up and scale out in an easy manner.
Many databases can do part of this manually, a few can do this semi-automatically. When it comes to read scaling with replication, there are a few solutions, like Oracle RAC, but there are very few relational database systems that can handle true write scaling while preserving true ACID properties. This is a critical need for any company that wants to compete in the data space. That’s one of the reasons why MariaDB acquired ClustrixDB last year.
Q2. Technically speaking, how is it possible to adjust scaling so that you can run the database in the background in a desktop with very few resources, and up to a multi node cluster with petabytes of data with read and write scaling?
Michael Widenius: Traditionally databases are optimized for one particular setup. It’s very hard to be able to run efficiently both with a very small footprint, which is what desktop users are expecting, and yet provide extreme scale out.
The reason we can do that in MariaDB Platform is thanks to the unique separation between the query processing and data storage layers (storage engines). One can start by using a storage engine that requires a relatively small footprint (Aria or InnoDB) and, when demands grow, with a few commands move all or just part of the data to distributed storage with MariaDB ColumnStore, Spider, MyRocks or, in the future, ClustrixDB. One can also very easily move to a replication setup where you have one master for all writes and any number of read replicas. MariaDB Cluster can be used to provide a fully functional master-master network that can be replicated to remote data centers.
My belief is that MariaDB is the most advanced database in existence, when it comes to providing complex replication setups and very different ways to access and store data (providing OLTP, OLAP and hybrid OLTP/OLAP functionalities) while still providing one consistent interface to the end user.
Q3. How do you plan to use ClustrixDB distributed database technology for MariaDB?
Michael Widenius: We will add this as another storage engine for the user to choose from. What it means is that if one wants to switch a table called t1 from InnoDB to ClustrixDB, the only command the user needs to do is:
ALTER TABLE t1 STORAGE_ENGINE=ClustrixDB;
The interesting thing with ClustrixDB is not only that it’s distributed and can automatically scale up and down based on demands, but also that a table on ClustrixDB can be accessed by different MariaDB servers. If you create a ClustrixDB table on one MariaDB server, it’s at once visible to all other MariaDB servers that are attached to the same cluster.
Q3. Why is having Oracle compatibility in MariaDB a game changer for the database industry?
Michael Widenius:MariaDB Platform is the only enterprise open source database that supports a significant set of Oracle syntax. This makes it possible for the first time to easily move Oracle applications to an open source solution, get rid of single-vendor lock-in and leverage existing skill sets. MariaDB Corporation is also the best place to get migration help as well as enterprise features, consultative support and maintenance.
Q4. How do you manage with MariaDB to parse, depending on the case, approximately 80 percent of the legacy Oracle PL/SQL without rewriting the code?
Michael Widenius: Oracle PL/SQL was originally based on the same standard that created SQL, however Oracle decided to use different syntax than what’s used in ANSI SQL. Fortunately, most of the logical language constructs are the same. This made it possible to provide a mapping from most of the PL/SQL constructs to ANSI.
What we did:
– Created a new parser, sql_yacc_ora.yy, which understands the PL/SQL constructs, and map the PL/SQL syntax to existing MariaDB internal structures.
– Added support for SQL_MODE=ORACLE mode, to allow the user to switch which parser to use. The mode is stored as part of SQL procedures to allow users to run a stored procedure without having to know if it’s written in ANSI SQL or PL/SQL.
– Extended MariaDB with new Oracle compatibility that we didn’t have before such as SEQUENCES, PACKAGES, ROW TYPE etc.
You can read all about the Oracle compatibility functionality that MariaDB supports here.
Q5. When embarking on a database migration, what are the best practices and technical solutions you recommend?
Michael Widenius: The best possible database migration is when you are able to move all your data and stored procedures unchanged(!) to the new system.
That is our goal when we are supporting a migration from Oracle to MariaDB. This usually means that we are working closely with the customer to analyze the difficulty of the migration and determine a migration plan. It also helps that MariaDB supports MariaDB SQL/PL, a compatible subset of Oracle PL/SQL language.
If MariaDB is fairly new to you, then it’s best to start with something small that only uses a few stored procedures to give DBAs a chance to get to know MariaDB better. When you’ve succeeded to move a couple of smaller installations, then it’s time to start with the larger ones. Our expert migration team is standing by to assist you in any way possible.
Q6. Why did you combine your transactional and analytical databases into a single platform, MariaDB Platform X3?
Michael Widenius: Because thanks to the storage engine interface it’s easy for MariaDB to provide both transactional and analytical storage with one interface. Today it’s not efficient or desirable to have to move between databases just because your data needs grows. MariaDB can also provide the unique capability of using different storage engines on master and replicas. This allows you to have your master optimized for inserts while some of your replicas are optimized for analytical queries.
Q7. You also launched a managed service supporting public and hybrid cloud deployments. What are the benefits of such service to enterprises?
Michael Widenius: Some enterprises find it hard to find the right DBAs (these are still a scarce resource) and would rather want to focus on their core business instead of managing their databases. The managed service is there to help these enterprises to not have to think about how to keep the database servers up and running. Maintenance, upgrading and optimizing of the database will instead be done by people that are the definitive experts in this area.
Q8. What are the limitations of existing public cloud service offerings in helping companies succeed across their diverse cloud and on-prem environments?
Michael Widenius: Most of the existing cloud services for databases only ensures that the “database is up and running”. They don’t provide database maintenance, upgrading, optimization, consultative support or disaster management. More importantly you’re only getting a watered down version of MariaDB in the cloud rather than the full featured version you get with MariaDB Platform. If you encounter performance problems, serious bugs, crashes or data loss, you are on your own. You also don’t have anyone to talk with if you need new features for your database that your business requires.
Q9. How does MariaDB Platform Managed Service differ from existing cloud offering such as Amazon RDS and Aurora?
Michael Widenius: In our benchmarks that we shared at our MariaDB OpenWorks conference earlier this year, we showed that MariaDB’s Managed Service offering beats Amazon RDS and Aurora when it comes to performance. Our managed service also unlocks capabilities such as columnar storage, data masking, database firewall and many more features that you can’t get in Amazon’s services. See the full list here for a comparison.
Q10. What are the main advantages of using a mix of cloud and on-prem?
Michael Widenius: There are many reasons why a company will use a mix of cloud and on-prem. Cloud is where all the growth is and many new applications will likely go to the cloud. At the same time, this will take time and we’ll see many applications stay on prem for a while. Companies may decide to keep applications on prem for compliance and regulatory reasons as well. In general, it’s not good for any company to have a vendor that totally locks them into one solution. By ensuring you can run the exact same database on both on-prem and cloud, including ensuring that you have all your data in both places, you can be sure your company will not have a single point of failure.
Michael “Monty” Widenius, Chief Technology Officer, MariaDB.
Monty is the “spiritual father” of MariaDB, a renowned advocate for the open source software movement and one of the original developers of MySQL, the predecessor to MariaDB. In addition to serving as CTO for the MariaDB Corporation, he also serves as a board member of the MariaDB Foundation. He was a founder at SkySQL, and the CTO of MySQL AB until its sale to Sun Microsystems (now Oracle). Monty was also the founder of TCX DataKonsult AB, a Swedish data warehousing company. He is the co-author of the MySQL Reference Manual and was awarded in 2003 the Finnish Software Entrepreneur of the Year prize. In 2015, Monty was selected as one of the 100 most influential persons in the Finnish IT market. Monty studied at Helsinki University of Technology and lives in Finland.
Follow us on Twitter: @odbmsorg
“LeanXcale is the first startup that instead of going to market with a single innovation or know-how, is going to market with 10 disruptive innovations that are making it really differential for many different workloads and extremely competitive on different use cases.” — Patrick Valduriez.
I have interviewed Patrick Valduriez and Ricardo Jimenez-Peris. Patrick is a well know database researcher, and since 2019, he is the scientific advisor of LeanXcale. Ricardo is the CEO and Founder of LeanXcale. We talked about NewSQL, Hybrid Transaction and Analytics Processing (HTAP), and LeanXcale, a start up that offers an innovative HTAP database.
Q1. There is a class of new NewSQL databases in the market, called Hybrid Transaction and Analytics Processing (HTAP) – a term created by Gartner Inc. What is special about such systems?
Patrick Valduriez: NewSQL is a recent class of DBMS that seeks to combine the scalability of NoSQL systems with the strong consistency and usability of RDBMSs. An important class of NewSQL is Hybrid Transaction and Analytics Processing (HTAP) whose objective is to perform real-time analysis on operational data, thus avoiding the traditional separation between operational database and data warehouse and the complexity of dealing with ETLs.
Q2. HTAP functionality is offered by several database companies. How does LeanXcale compare with respect to other HTAP systems?
Ricardo Jimenez-Peris: HTAP covers a large spectrum that has three dimensions. One dimension is the scalability of the OLTP part. There is where we excel. We scale out linearly to hundreds of nodes. The second dimension is the ability to scale out OLAP. This is well known technology from the last two decades. Some systems are mostly centralized, but those that are distributed should be able to handle reasonably well the OLAP part. The third dimension is the efficiency on the OLAP part. There is where we are still working to improve the optimizer, so the expectation is that we will become pretty competitive in the next 18 months. Patrick’s expertise in distributed query processing will be key. I would like also to note that, for recurrent aggregation analytical queries, we are really unbeatable thanks to a new invention that enables us to update in real-time these aggregations, so these aggregation analytical queries becomes costless since they just need to read a single row from the relevant aggregation table.
Q3. Patrick, you wrote in a blog that “LeanXcale has a disruptive technology that can make a big difference on the DBMS market”. Can you please explain what is special about LeanXcale?
Patrick Valduriez: I believe that LeanXcale is at the forefront of the HTAP movement, with a disruptive technology that provides ultra-scalable transactions (see Q4), key-value capabilities (see Q5), and polyglot capabilities. On one hand, we support polyglot queries that allow integrating data coming from different data stores, such as HDFS, NoSQL and SQL systems. On the other hand, we already support SQL and key-value functionality on the same database, and soon we will support JSON documents in a seamless manner, so we are becoming a polystore.
LeanXcale is the first startup that instead of going to market with a single innovation or know-how, is going to market with 10 disruptive innovations that are making it really differential for many different workloads and extremely competitive on different use cases.
Q4. What are the basic principles you have used to design and implement LeanXcale as a distributed database that allows scaling transactions from 1 node to thousands?
Ricardo Jimenez-Peris: LeanXcale solves the traditional transaction management bottleneck with a new invention that lies in a distributed processing of the ACID properties, where each ACID property is scaled out independently but in a composable manner. LeanXcale’s architecture is based on three layers that scale out independently, 1) KiVi, the storage layer that is a relational key-value data store, 2) the distributed transactional manager that provides ultra-scalable transactions, and 3) the distributed query engine that enables to scale out both OLTP and OLAP workloads. KiVi counts with 8 disruptive innovations that provide dynamic elasticity, online aggregations, push down of all algebraic operators but join, active-active replication, simultaneous efficiency for both ingesting data and range queries, efficient execution in NUMA architectures, costless multiversioning, hybrid row-columnar storage, vectorial acceleration, and so on.
Q5. The LeanXcale database offers a so-called dual interface, key-value and SQL. How does it work and what is it useful for?
Ricardo Jimenez-Peris: (how does it work): The storage layer, it is a proprietary relational key-value data store, called KiVi, which we have developed. Unlike traditional key-value data stores, KiVi is not schemaless, but relational. Thus, KiVi tables have a relational schema, but can also have a part that is schemaless. The relational part enabled us to enrich KiVi with predicate filtering, aggregation, grouping, and sorting. As a result, we can push down all algebraic operators below a join to KiVi and execute them in parallel, thus saving the movement of a very large fraction of rows between the storage layer and they query engine layer. Furthermore, KiVi has a direct API that allows doing everything that SQL can do but join, but without the cost of SQL. In particular, it can ingest data as efficiently as the most efficient key-value data stores, but the data is stored in relational tables in a fully ACID way and the data is accessible through SQL. This enables to highly reduce the footprint of the database in terms of hardware resources for workloads where data ingestion represents a high fraction.
Patrick Valduriez: (what is it useful for): As for RDBMSs, the SQL interface allows rapid application development and remains the preferred interface for BI and analytics tools. The key-value interface is complementary and allows the developer to have better control of the integration of application code and database access, for higher performance. This interface also allows easy migration from other key-value stores.
Q6. You write that LeanXcale could be used in different ways. Can you please elaborate on that?
Ricardo Jimenez-Peris: LeanXcale can be used in many different ways: as an operational database (thanks to transaction scalability), as a data warehouse (thanks to our distributed OLAP query engine), as a real-time analytics platform (due to our HTAP capability), as an ACID key-value data store (using KiVi and our ultra-scalable transactional management), as a time series database (thanks to our high ingestion capabilities), as an integration polyglot query engine (based on our polyglot capabilities), as an operational data lake (combining our scalability in volume of a data lake with operational capabilities at any scale), as a fast data store (using KiVi as standalone), as an IoT database (deploying KiVi in IoT devices), and edge database (deploying KiVi on IoT devices and the edge and full LeanXcale database on the cloud with georeplication).
Thanks to all our innovations and our efficiency and flexible architecture, we can compete in many different scenarios.
Q7. The newly defined SQL++ language allows adding a JSON data type in SQL. N1QL for Analytics is the first commercial implementation of SQL++ (**). Do you plan to support SQL++ as well?
Ricardo Jimenez-Peris and Patrick Valduriez: Yes, but within SQL, as we don’t think any language will replace SQL in the near future. Over the last 30 years, there have been many claims that new languages would (“soon”) replace SQL, e.g., object query languages such as OQL in the 1990s or XML query languages such as XQuery in the 2000s. But this did not happen for three main reasons. First, SQL’s data abstraction (table) is ubiquitous and simple. Second, the language is easy to learn, powerful and has been adopted by legions of developers. And it is a (relatively) standard language, which makes it a good interface for tool vendors. This being said, the JSON data model is important to manage documents and SQL++ is a very nice SQL-like language for JSON. In LeanXcale, we plan to support a JSON data type in SQL columns and have a seamless integration of SQL++ within SQL, with the best of both (relational and document) worlds. Basically, each row can be relational or JSON and SQL statements can include SQL++ statements.
Q8. What are the typical use cases for LeanXcale? and what are the most interesting verticals for you?
Ricardo Jimenez-Peris: Too many. Basically, all data intensive use cases. We are ideal for the new technological verticals such as traveltech, adtech, IoT, smart-*, online multi-player games, eCommerce, …. But we are also very good and cost-effective for traditional use cases such as Banking/Finance, Telco, Retail, Insurance, Transportation, Logistics, etc.
Q9. Patrick, as Scientific Advisor of LeanXcale, what is your role? What are you working at present?
Patrick Valduriez: My role is as a sort of consulting chief architect for the company, providing advice on architectural and design choices as well as implementation techniques. I will also do what I like most, i.e., teach the engineers the principles of distributed database systems, do technology watch, write white papers and blog posts on HTAP-related topics, and do presentations at various venues. We are currently working on query optimization, based on the Calcite open source software, where we need to improve the optimizer cost model and search space, in particular, to support bushy trees in parallel query execution plans. Another topic is to add the JSON data type in SQL in order to combine the best of relational DBMS and document NoSQL DBMS.
Q10. What is the role that Glenn Osaka is having as an advisor for LeanXcale?
Ricardo Jimenez-Peris: Ricardo: Glenn is an amazing guy and successful Silicon Valley entrepreneur (CEO of Reactivity, sold to Cisco). He was advisor of Peter Thiel at Confinity, who later merged his company with Elon Musk’s X.com to create PayPal, and continued to be advisor there till it was sold to eBay.
He is guiding us in the strategy to become a global company. A company doing B2B to enterprises has as main challenge to overcome the slowness of enterprise sales, and through his advice we have built a strategy to overcome this slowness.
Q11. You plan to work with Ikea. Can you please tell us more?
Ricardo Jimenez-Peris: Ikea has isolated ERPs per store. Thus, the main issue is that when a customer wants to buy an item at a store and there is not enough stock, this isolation prevents them from selling using stock from other stores. Similarly, orders for new stock are not optimized since they are made based on the local store view. We are providing them with a centralized database that keeps the stock across all stores and solving the two problems. We are also working with them in a proximity marketing solution to offer customers coupon-based discounts as they go through the store.
Qx Anything else you wish to add?
Patrick Valduriez: Well, the adventure just got started and it is already a lot of fun. It is a great opportunity for me, and probably the right time, to go deeper in applying the principles of distributed and parallel databases on real-world problems. The timing is perfect as the new (fourth) edition of the book “Principles of Distributed Database Systems“, which I co-authored with Professor Tamer Özsu, is in production at Springer. As a short preview, note that there is a section on LeanXcale’s ultra-scalable transaction management approach in the transaction chapter and another section on LeanXcale’s architecture in the NoSQL/NewSQL chapter.
Ricardo Jimenez-Peris: Ricardo: It is a really exciting moment now that we are going to market. We managed to build an amazing team able to make the product strong and go to market with it. We believe to be the most innovative startup in the database arena and our objective is to become the next global database company. Still a lot of work and exciting challenges ahead. Now we are working on our database cloud managed service that will be delivered in Amazon, hopefully, by the end of the year.
Dr. Patrick Valduriez is a senior scientist at Inria in France. He has been a scientist at Microelectronics and Computer Technology Corp. in Austin (Texas) in the 1980s and a professor at University Pierre et Marie Curie (UPMC) in Paris in the early 2000s. He has also been consulting for major companies in USA (HP Labs, Lucent Bell Labs, NERA, LECG, Microsoft), Europe (ESA, Eurocontrol, Ask, Shell) and France (Bull, Capgemini, Matra, Murex, Orsys, Schlumberger, Sodifrance, Teamlog). Since 2019, he is the scientific advisor of the LeanXcale startup.
He is currently the head of the Zenith team (between Inria and University of Montpellier, LIRMM) that focuses on data science, in particular data management in large-scale distributed and parallel systems and scientific data management. He has authored and co-authored many technical papers and several textbooks, among which “Principles of Distributed Database Systems” with Professor Tamer Özsu. He currently serves as associate editor of several journals, including the VLDB Journal, Distributed and Parallel Databases, and Internet and Databases. He has served as PC chair of major conferences such as SIGMOD and VLDB. He was the general chair of SIGMOD04, EDBT08 and VLDB09.
He received prestigious awards and prizes. He obtained several best paper awards, including VLDB00. He was the recipient of the 1993 IBM scientific prize in Computer Science in France and the 2014 Innovation Award from Inria – French Academy of Science – Dassault Systems. He is an ACM Fellow.
Dr. Ricardo Jimenez was professor and researcher at Technical University of Madrid (Universidad Politécnica de Madrid – UPM) and abandoned his academic career to bring to the market an ultra-scalable database. At UPM, he already sold technology to European enterprises such as Ericsson, Telefonica, and Bull. He has been member of the advisory Committee on Cloud Computing for the European Commission.
He is co-inventor of the two patents already granted in US and Europe and of 8 new patent applications that are being prepared. He is co-author of the book “Replicated Databases” and more than 100 research papers and articles.
He has been invited to present LeanXcale technology in the headquarters of many tech companies in Silicon Valley such as Facebook, Twitter, Salesforce, Heroku, Greenplum (now Pivotal), HP, Microsoft, etc.
He has coordinated (as overall coordinator or technical coordinator) over 10 European projects. One of them, LeanBigData, was awarded with the “Best European project” award by the Madrid Research Council (Madri+d).
– LeanXcale was awarded with the “Best SME” award by the Innovation Radar of the European Commission in Nov. 2017 recognizing it as the most innovative European startup. LeanXcale has been identified as one of the innovator startups in the NewSQL arena by Bloor market analyst, and has been identified as one of the companies in the HTAP arena by 451 Research market analyst.
Follow us on Twitter: @odbmsorg
“A lot of times we think of digital transformation as a technology dependent process. The transformation takes place when employees learn new skills, change their mindset and adopt new ways of working towards the end goal.”–Kerem Tomak
I have interviewed Kerem Tomak, Executive VP, Divisional Board Member, Big Data-Advanced Analytics-AI, at Commerzbank AG. We talked about Digital Transformation, Big Data, Advanced Analytics and AI for the financial sector.
Commerzbank AG is a major German bank operating as a universal bank, headquartered in Frankfurt am Main. In the 2019 financial year, the bank was the second largest in Germany after the balance sheet total. The bank is present in more than 50 countries around the world and provides almost a third of Germany’s trade finance. In 2017, it handled nearly 13 million customers in Germany and more than 5 million customers in Central and Eastern Europe. (source: Wikipedia).
Q1. What are the key factors that need to be taken into account when a company wants to digitally transform itself?
Kerem Tomak: It starts with a clear and coherent digital strategy. Depending on the level of the company this can vary from operational efficiencies as the main target to disrupting and changing the business model all together. Having clear scope and objectives of the digital transformation is key in its success.
A lot of times we think of digital transformation as a technology dependent process. The transformation takes place when employees learn new skills, change their mindset and adopt new ways of working towards the end goal. Digital enablement together with a company wide upgrade/replacement of legacy technologies with new ones like Cloud, API, IoT etc. is the next step towards becoming a digital company. With all this comes the most important ingredient, thinking outside the box and taking risks. One of the key success criteria in becoming a digital enterprise is the true and speedy “fail fast, learn and optimize” mentality. Avoiding (calculated) risks, especially at the executive level, will limit growth and hinder transformation efforts.
Q2. What are the main lessons you have learned when establishing strategic, tactical and organizational direction for digital marketing, big data and analytics teams?
Kerem Tomak: For me, culture eats strategy. Efficient teams build a culture in which they thrive. Innovation is fueled by teams which constantly learn and share knowledge, take risks and experiment. Aside from cultural aspects, there are three main lessons I learned over the years.
First: Top down buy-in and support is key. Alignment with internal and external key stakeholders is vital – you cannot create impact without them taking ownership and being actively involved in the development of use cases.
Second: Clear prioritization is necessary. Resources are limited, both in the analytics teams and with the stakeholders. OKRs provide very valuable guidance on steering the teams forward and set priorities.
Third: Building solutions which can scale over a stable and scalable infrastructure. Data quality and governance build clean input channels to analytics development and deployment. This is a major requirement and biggest chunk of the work. Analytics capabilities then guide what kind of tools and technologies can be used to make sense of this data. Finally, integrating with execution outlets such as a digital marketing platform creates a feedback loop that teams can learn and optimize against.
Q3. What are the main challenges (both technical and non) when managing mid and large-size analytics teams?
Kerem Tomak: Again, building a culture in which teams thrive independent of size is key. For analytics teams, constantly learning/testing new techniques and technologies is an important aspect of job satisfaction for the first few years out of academia. Promotion path clarity and availability of a “skills matrix” makes it easy to understand what leadership values in the employees are important and provides guidance on future growth opportunities. I am not a believer in hierarchical organizations so keeping job levels as low as possible is necessary for speed and delivery. Hiring and retaining right skills in the analytics teams are not easy, especially in hot markets like Silicon Valley. Most analytics employees follow leaders and generally stay loyal to them. Head of an analytics team plays an extremely important role. That will “make it or break it” for analytics teams. Finally, analytics platforms with the right tools and scale is critical for the teams’ success.
Q4. What does it take to successfully deliver large scale analytics solutions?
Kerem Tomak: First, one needs a flexible and scalable analytics infrastructure – this can comprise on-premise components like a Chatbots for example, as well as shared components via a Public Cloud. Secondly, it takes an end-to-end automation of processes, in order to attain scale fast and on demand. Last but not least, companies need an accurate sense of customers’ needs and requirements to ensure that the developed solution will be adopted.
Q5. What parameters do you normally use to define if an analytics solution is really successful?
Kerem Tomak: An analytics solution is successful if it has a high impact. Some key parameters are usage, increased revenues and reduced costs.
Q6. Talking about Big Data, Advanced Analytics and AI: Which companies are benefiting from them at present?
Kerem Tomak: Maturity of Big Data, AA and AI differs across industries. Leading the pack are Tech, Telco, Financial Services, Retail and Automotive. In each industry there are leaders and laggards. There are fewer and fewer companies untouched by BDAA and AI.
Q7. Why are Big Data and Advanced Analytics so important for the banking sector?
Kerem Tomak: This has (at least) two dimensions. First: Like any other company that wants to sell products or services, we must understand our client’s needs. Big Data and Advanced Analytics can give us a decisive advantage here. For example – with our customers’ permission of course – we can analyze their transactions and thus gain useful information about their situation and learn what they need from their bank. Simply put: A person with a huge amount of cash in their account obviously has no need for a consumer credit at the moment. But the same person might have a need for advice on investment opportunities. Data analysis can give us very detailed insights and thus help us to understand our customers better.
This leads to the second dimension, which is risk management. As a bank we are risk taking specialists. The better the bank does in understanding the risks it takes, the more efficient it can act to counterbalance those risks. Benefits are a lower rate of credit defaults as well as a more accurate credit pricing. This is in favor of both the bank and its customers.
Data is the fabric which new business models are made of but Big Data does not necessarily mean Big Business: The correct evaluation of data is crucial. This will also be a decisive factor in the future as to whether a company can hold its own in the market.
Q8. What added value can you deliver to your customers with them?
Kerem Tomak: Well, for starters, Advanced Analytics helps us to prevent fraud. In 2017, Commerzbank used algorithms to stop fraudulent payments in excess of EUR 100 million. Another use case is the liquidity forecast for small and medium-sized enterprises. Our Cash Radar runs in a public cloud and generates forecasts for the development of the business account. It can therefore warn companies at an early stage if, for example, an account is in danger of being underfunded. So with the help of such innovative data-driven products, the bank obviously can generate added customer value, but also drive its growth and set itself apart from its competitors.
Additionally, Big Data and Advanced Analytics generate significant internal benefits. For example, Machine Learning is providing us with efficient support to prevent money laundering by automatically detecting conspicuous payment flows. Another example: Chatbots already regulate part of our customer communication. Also, Commerzbank is the first German financial institution to develop a data-based pay-per-use investment loan. The redemption amount is calculated from the use of the capital goods – in this case the utilization of the production machines, which protects the liquidity of the user and gives us the benefit of much more accurate risk calculations.
When we bear in mind that the technology behind examples like these is still quite new, I am confident that we will see many more use cases of all kinds in the future.
Q9. It seems that Artificial Intelligence (AI) will revolutionize the financial industry in the coming years. What is your take on this?
Kerem Tomak: When we talk about artificial intelligence, currently, we basically still mean machine learning. So we are not talking about generalized artificial intelligence in its original sense. It is about applications that recognize patterns and learn from these occurrences. Eventually tying these capabilities to applications that support decisions and provide services make AI (aka Machine Learning) a unique field. Even though the field of data modelling has developed rapidly in recent years, we are still a long way from the much-discussed generalized artificial intelligence which had the machine goal outlined in 1965 as “machines will be capable, within twenty years, of doing any work a man can do”. With the technology available today we can think of the financial industry having new ways of generating, transferring, accumulating wealth in ways we have not seen before all predicated upon individual adoption and trust.
Q10. You have been working for many years in US. What are the main differences you have discovered in now working in Europe?
Kerem Tomak: Europeans are very sensitive to privacy and data security. The European Union has set a high global standard with its General Data Protection Regulation (GDPR). In my opinion, Data protection “made in Europe” is a real asset and has the potential to become a global blueprint.
Also, Europe is very diverse – from language over culture to different market environments and regulatory issues. Even though immense progress has been made in the course of harmonization in the European Union, a level playing field remains one of the key issues in Europe, especially for Banks.
Technology adoption is lagging in some parts of Europe. Bigger infrastructure investments, wider adoption of public cloud, 5G deployment are needed to stay competitive and relevant in global markets which are increasingly dominated by US and China. This is both an opportunity and risk. I see tremendous opportunities everywhere from IoT to AI driven B2B and B2C apps for example. If adoption of public cloud lags any further, I see the risk of falling behind on AI development and innovation in EU.
Finally, I truly enjoy the family oriented work-life balance here which in turn increases work productivity and output.
Dr. Kerem Tomak, Executive VP, Divisional Board Member, Big Data-Advanced Analytics-AI, Commerzbank AG
Kerem brings more than 15 years of experience as a data scientist and an executive. He has expertise in the areas of omnichannel and cross-device attribution, price and revenue optimization, assessing promotion effectiveness, yield optimization in digital marketing and real time analytics. He has managed mid and large-size analytics and digital marketing teams in Fortune 500 companies and delivered large scale analytics solutions for marketing and merchandising units. His out-of-the box thinking and problem solving skills led to 4 patent awards and numerous academic publications. He is also a sought after speaker in Big Data and BI Platforms for Analytics.
Follow us on Twitter: @odbmsorg
“At some point, most companies come to the realization that the advanced technologies and innovation that allow them to improve business operations also generate increased amounts of data that existing legacy technology is unable to handle, resulting in the need for more new technology. It is a cyclical process that CIOs need to prepare for.” –Scott Gnau
InterSystems has appointed last month Scott Gnau to Head of their Data Platforms Business Unit. I have asked Scott a number of questions related to data management, what are his advices for Chief Information Officers, what is the positing of the InterSystems IRIS™ family of data platforms, and what is the technology vision ahead for the company’s Data Platforms business unit.
Q1. What are the main lessons you have learned in more than 20 years of experience in the data management space?
Scott Gnau: The data management space is a people-centric business, whether you are dealing with long-time customers or developers and architects. The formation of a trusted relationship can be the difference between a potential customer selecting one vendor’s technology which comes with the benefit of partnering for long term success, over a similar competitor’s technology.
Throughout my career, I have also learned how risky data management projects can be. They essentially ensure the security, cleanliness and accuracy of an organization’s data. They are then responsible for scaling data-centric applications, which helps inform important business decisions. Data management is a very competitive space which is only becoming more crowded.
Q2. What is your most important advice for Chief Information Officers?
Scott Gnau: At some point, most companies come to the realization that the advanced technologies and innovation that allow them to improve business operations also generate increased amounts of data that existing legacy technology is unable to handle, resulting in the need for more new technology. It is a cyclical process that CIOs need to prepare for.
Phenomena such as big data, the internet of things (IoT), and artificial intelligence (AI) are driving the need for this modern data architecture and processing, and CIOs should plan accordingly. For the last 30 years, data was primarily created inside data centers or firewalls, was standardized, kept in a central location and managed. It was fixed and simple to process.
In today’s world, most data is created outside the firewall and outside of your control. The data management process is now reversed – instead of starting with business requirements, then sourcing data and building and adjusting applications, developers and organizations load the data first and reverse engineer the process. Now data is driving decisions around what is relevant and informing the applications that are built.
Q3. How do you position the InterSystems IRIS™ family of data platforms with respect to other similar products on the market?
Scott Gnau: The data management industry is crowded, but the InterSystems IRIS data platform is like nothing else on the market. It has a unique, solid architecture that attracts very enthusiastic customers and partners, and plays well in the new data paradigm. There is no requirement to have a schema to leverage InterSystems IRIS. It scales unlike any other product in the data management marketplace.
InterSystems IRIS has unique architectural differences that enable all functions to run in a highly optimized fashion, whether it be supporting thousands of concurrent requests, automatic and easy compression, or highly performant data access methods.
Q4. What is your strategy with respect to the Cloud?
Scott Gnau: InterSystems has a cloud-first mentality, and with the goal of easy provisioning and elasticity, we offer customers the choice for cloud deployments. We want to make the consumption model simple, so that it is frictionless to do business with us.
InterSystems IRIS users have the ability to deploy across any cloud, public or private. Inside the software it leverages the cloud infrastructure to take advantage of the new capabilities that are enabled because of cloud and containerized architectures.
Q5. What about Artificial Intelligence?
Scott Gnau: AI is the next killer app for the new data paradigm. With AI, data can tell you things you didn’t already know. While many of the mathematical models that AI is built on are on the older side, it is still true that the more data you feed them the more accurate they become (which fits well with the new paradigm of data). Generating value from AI also implies real time decisioning, so in addition to more data, more compute and edge processing will define success.
Q6. How do you plan to help the company’s customers to a new era of digital transformation?
Scott Gnau: My goal is to help make technology as easy to consume as possible, to ensure that it is highly dependable. I will continue to work in and around vertical industries that are easily replicable.
Q7. What customers are asking for is not always what customers really need. How do you manage this challenge?
Scott Gnau: Disruption in the digital world is at an all-time high, and for some, impending change is sometimes too hard to see before it is too late. I encourage customers to be ready to “rethink normal,” while putting them in the best position for any transitions and opportunities to come. At the same time, as trusted partners we also are a source of advice to our customers on mega trends.
Q8. What is your technology vision ahead for the company’s Data Platforms business unit?
Scott Gnau: InterSystems continues to look for ways to differentiate how our technology creates success for our customers. We judge our success on our customers’ successes. Our unique architecture and overall performance envelope plays very well into data centric applications across multiple industries including financial services, logistics and healthcare. With connected devices and the requirement for augmented transactions we play nicely into the future high value application space.
Q9. What do you expect from your new role at InterSystems?
Scott Gnau: I expect to have a lot of fun because there is an infinite supply of opportunity in the data management space due to the new data paradigm and the demand for new analytics. On top of that, InterSystems has many smart, passionate and loyal customers, partners and employees. As I mentioned up front, it’s about a combination of great tech AND great people that drives success. Our ability to invest in the future is extremely strong – we have all the key ingredients.
Scott Gnau joined InterSystems in 2019 as Vice President of Data Platforms, overseeing the development, management, and sales of the InterSystems IRIS™ family of data platforms. Gnau brings more than 20 years of experience in the data management space helping lead technology and data architecture initiatives for enterprise-level organizations. He joins InterSystems from HortonWorks, where he served as chief technology officer. Prior to Hortonworks, Gnau spent two decades at Teradata in increasingly senior roles, including serving as president of Teradata Labs. Gnau holds a Bachelor’s degree in electrical engineering from Drexel University.
– On AI, Big Data, Healthcare in China. Q&A with Luciano Brustia ODBMS.org, 8 APR, 2019.
Follow us on Twitter: @odbmsorg