DATE: April 7, 2020
Jonathan Lacefield, Product DataStax
DataStax has announced the general availability (GA) of DSE 6.8.
This is a pivotal release for DataStax users as it represents the first, production ready version of Cassandra that is truly cloud-native thanks to the inclusion of the Cassandra Kubernetes Operator. In addition to this, with improvements like 4X faster node additions and recovery through Zero Copy Streaming, Incremental Nodesync, Guardrails, and our new Graph engine, DataStax Enterprise (DSE) 6.8 helps users scale-out their data workloads like never before.
Here’s a quick introduction to all of the great features DataStax is providing in the DSE 6.8 release.
You can also register here for our deep-dive April 21 webinar where Ed Anuff and I will go into the following in greater detail.
Last week, DataStax announced the Cassandra Kubernetes Operator, which included production support for DSE and experimental support for Cassandra clusters. As my colleague Chris Bradford stated in his great blog post on this topic
“We created the operator because we found that deploying DataStax Enterprise (DSE) on-premises or in containers was often difficult and daunting. We took action to automate the process by moving to Kubernetes and building the operator.”
With DSE 6.8, users have a fully production supported, cloud native Cassandra database option that simplifies scaling and deployments.
Cassandra is a database purpose built for distributed workloads, across any infrastructure type, regardless if the infrastructure is on-prem, in the cloud or both. One could say Cassandra is the canonical cloud database. Our years of experience building a distributed, cloud database has allowed us to optimize the elasticity of C*. With the release of DSE 6.8, we are pleased to provide users with an optimization aimed at significantly reducing the amount of time it takes Cassandra to scale-out during peak periods of demand or perform business continuity tasks. We call this optimization Zero Copy Streaming.
Our internal testing shows that Zero Copy Streaming is up to 4 times as fast as previous versions of Cassandra at adding nodes to a cluster.
There’s no configuration needed to take advantage of this optimization though configurations are provided to tune streaming for your specific use case. To take advantage of this optimization, simply upgrade to DSE 6.8.
Cassandra is sometimes referred to like the super powers Spiderman acquires in the comic book series: “with great power comes great responsibility”. Having spent over 6 years helping users be successful with Cassandra, I understand this analogy first-hand. The flexibility of Cassandra coupled with it’s scalability can sometimes cause new users issues when they port RDBMS thinking to Cassandra. To date, the Cassandra community and DataStax have tried to help Cassandra users, with a focus on developers, avoid known anti-patterns for Cassandra through enablement, education, and teaching with an emphasis on data modeling best practices.
For example, Patrick McFadin’s seminal Cassandra Data Modeling video from 2014 is still one of the most popular videos on Cassandra with over 50,000 views.
Here at DataStax, we’ve spent a considerable amount of time thinking about this dynamic, i.e. using human interaction to mitigate known anti-patterns. We realized there’s a better way forward. The better way forward is to create configurable trip-wires in Cassandra that will either warn or error and block any operation that violates known anti-patterns. We call this mechanism Guardrails. In DSE 6.8, DataStax is releasing the first set of Guardrails which include codified, best-practices such as:
- Consistency levels allowed
- Payload sizes
- Column sizes
- Collection sizes
- Number of indices
- Number of materialized views
- And more
With DSE 6.8, you can have confidence that Cassandra will prevent anti-patterns at any scale.
In DSE 6.0, DataStax released the first iteration of Nodesync. For background, NodeSync is an easy to use continuous background repair that has low overhead and provides consistent performance and virtually eliminates manual efforts to run repair operations in a DataStax cluster. With DSE 6.8 we’ve created an optimized version of Nodesync that is smart enough to recognize when data has already been synchronized across a cluster and doesn’t attempt to resync that data. The results of this optimization are pretty impressive.
As you can see from our initial testing results, once a table has been synchronized the amount of time additional synchronization routines need to run is near constant.
The end result for you is a distributed database that remains consistent and available with little overhead on your servers allowing you to keep costs low and your users happy.
In DSE 5.0, DataStax announced the first iteration of DSE Graph. DSE Graph broke boundaries and barriers in distributed, Labeled Property Graphs. By building a graph database on top of a purpose built cloud database platform with integrated search and analytics capabilities, DSE Graph helped users solve problems that were previously unsolvable.
With DSE 6.8, DataStax is taking the next step in the evolution of distributed graphs by moving the graph storage engine deeper into Cassandra. Graph in DSE 6.8 is a cloud-native graph database that gives users the freedom of choice to move between graph and Cassandra native APIs on a single set of data. We call this concept One Model and when combined with Cassandra’s ability to scale, Graph in DSE 6.8 is a unique solution that gives developers the best option to use cloud scale, native graph data patterns like unbounded trees, paths, and stars.
Graph in DSE 6.8 works because the graph data model itself is now embedded in Cassandra. To show you what we mean, here’s a very simple graph showing the relationship between people and the software they create. This simple graph has 2 vertex graph objects, person and software, and 1 edge graph object, Created.
In DSE Graph 6.8 you can use Gremlin to define the graph.
A vertex would be defined with this schema.
And here’s what the edge DDL looks like in Gremlin.
Now, here’s the cool part. Inside of Cassandra, matching CQL tables are created. They look like the following.
The Person Vertex
CREATE TABLE test.person (
PRIMARY KEY (personKey, lastName)
WITH VERTEX LABEL person_label;
The Created Edge
CREATE TABLE test."personCreatedSoftware" (
PRIMARY KEY (personKey, swKey)
WITH EDGE LABEL created
Notice the syntax that follows the with statement, that’s the added Graph metadata that tells Cassandra that these CQL tables are also graph objects. What this means is that users can write data to these Cassandra native tables using CQL and then read these tables as graph objects using Gremlin, or vice-versa or both. Cassandra users with existing databases can add the power of graph to their existing database simply by issuing Alter statements to their keyspaces and tables.
In DSE 6.8, DataStax has made graph data scale natively like any other Cassandra data type.
This streamlined and highly scalable approach to graph provides Cassandra developers with the flexibility to join data across partitions while giving graph developers the power to work with real time, dynamic graphs at scales that were not seen as possible.
We’ve also made it over 10 times faster compared to previous versions of DataStax Graph.
We’re so excited about Graph in DSE 6.8 that our colleagues, Dr. Denise Gosnell and Dr. Matthias Broecheler wrote a book about the awesome power it provides developers called The Practitioner’s Guide to Graph Data.
New Secondary Indexing Preview
We’re also excited to introduce a preview of our new secondary indexing engine called Storage-Attached Indexing. This is a new engine built from the ground up to provide a better secondary indexing experience that is designed for scale, performance, and reliability. Out of the box it has:
- Support for =, >, < , AND operators
- Support for all Apache Cassandra consistency levels
- Full authentication support, including RLAC
- Zero-copy streaming support (we don’t need to rebuild indices when bootstrapping/decommissioning)
- System views with the information any user or support person might need to help understand the status and metadata of an index.
- Snapshotting/restore support
- We automatically pick the index (numeric or text) based on the Apache Cassandra column type
- No specific configs outside of whether the user wants to honor case sensitivity
- Automatic index corruption detection and resolution (should rarely happen)
We’re excited for our users to provide feedback on our new indexing engine as we rapidly update it with new features and functionality.
In addition to production Kubernetes support of Cassandra, faster scaling, faster and lightweight data consistency, guardrails and graph, DataStax has also provided a lot of other enhancements to Search, Analytics and Cassandra in DSE 6.8.
For example, we’ve published DataStax Desktop as an officially supported artifact that enables developers to work offline with different versions of Cassandra and DSE. We’ve also created a new, CQL based Backup and Restore service as a part of the Kubernetes Operator.
DSE 6.8 is the Cloud-Native, Scale-Out choice for today’s enterprises looking for a production ready version of Cassandra.