Introducing Vertica 9

by Roberto Zicari · September 27, 2017

Unified advanced analytics database features advancements in in-database Machine Learning, direct querying of Parquet data on AWS S3, support for Google Cloud Platform and Azure Power BI, and beta release of cloud optimized separation of compute and storage

Read the Press Release

The latest release includes improvements across Vertica’s four core areas of platform development, as well as the beta release of Vertica in Eon Mode, which enables organizations to evaluate the separation of compute and storage for Amazon Web Services (AWS) deployments:

Analyze in the Right Place
Machine Learning and Advanced Analytics
Freedom from Underlying Infrastructure
Performance at Exabyte Scale

Vertica 9 will be generally available in October 2017. Questions about Vertica 9? Check out theFAQ Page and Solution Brief

Analyze in the Right Place

Query Parquet Data from S3 Data Lake

Challenge: Highly concurrent, interactive analysis of the exploding volume of data stored on AWS S3
Solution: External tables allow customers to query Parquet data stored in AWS S3 directly from Vertica
Benefits: Ability to explore S3 Data Lake with advanced SQL analytics and extreme performance

Data Access Restriction with Security Realms

Challenge: Managing security concerns on Hadoop so different groups of users have access to select sets of data
Solution: Leverage Kerberos Realms to support different data access policies for various groups
Benefit: Enhanced control and data access restriction over different groups of users

Integration with Apache Sentry

Challenge: Companies that have centralized the management of user permissions want Hadoop usernames pass through when using Vertica to access data on HDFS
Solution: Vertica integrated with Apache Sentry so that the privileges associated with a Cloudera username can govern access control in Vertica
Benefit: Reduced operational burden with centralized security policies

Machine Learning and Advanced Analytics

Convert Categorical Data to Numerical Data

Challenge: When preparing data for statistical analysis, users must manually convert categorical data to numerical data
Solution: Built-in function automatically converts categorical data to numerical data with one-hot encoding columns, or dummy variables, from a given column in Vertica
Benefit: Less time spent on manual data preparation and custom conversion scripts

Cross-Validate Machine Learning Models

Challenge: Data scientists need to review the comparative performance of various algorithms and choose from multiple possible hyper-parameter values
Solution: Cross-validation function that enables more accurate evaluation of a model’s performance by training the model with more varied subsets of data
Benefit: Data scientists can more easily compare various models and avoid overfitting

Export Machine Learning Models

Challenge: Users with multiple Vertica clusters or separate development and production clusters need to train a machine learning model in one cluster and then move it to other clusters for scoring
Solution: Ability to import/export machine learning models across Vertica clusters
Benefit: Less time spent duplicating model building and training

Vertica’s new in-database machine learning capabilities are like gold! We are extremely excited to train our Machine Learning models on our data in Vertica and ship them with our platform to run on our customers’ clusters. This is something that is much harder with any other tool. Vertica’s in-database machine learning will improve our ability to offer new predictive analytics features quickly and easily to our growing customer base. It will improve our competitive positioning.

– Abhishek Sharma, Data Scientist, Fidelis Cybersecurity

Freedom from Underlying Infrastructure

Available on Google Cloud Marketplace “Launcher”

Challenge: Companies want more freedom from underlying infrastructure and to avoid being locked into one cloud
Solution: Regular publication of Vertica template images in the Google Cloud Marketplace that launch a guided provisioning process
Benefits: Vertica users running in the AWS or Azure clouds, or on premises, can more easily deploy their workloads to Google Cloud

Microsoft Power BI Certification

Challenge: Previous integration between Microsoft Power BI and Vertica limited the scale and performance of data loading and analytics
Solution: Power BI now connects to Vertica via a new DirectConnect approach
Benefits: Faster, more scalable and more secure data analytics with Vertica and Microsoft Power BI

Cloud Provisioning with Management Console (MC)

Challenge: Scripting requirements of cloud vendors’ provisioning tools make it difficult to get started in the cloud
Solution: Augment Cloud Service Provider (CSP) provisioning tools with a user-friendly GUI wizard in the Vertica Management Console, including post-provisioning steps such as data loading and querying
Benefits: Easier to get Vertica up and running in the cloud, saving time and resources

Eon Mode Beta: Separation of Compute and Storage

Challenge: Variable-demand workloads need to scale for peak demand, and reduce size during low activity
Solution: Separation of compute and storage so that compute can be reduced during low-demand periods
Benefits: Rapid elasticity and reduced infrastructure spending

To join the Vertica Eon Mode Beta Program, visit the sign up page here

Performance at Exabyte Scale

Hierarchical Partition Management

Challenge: Partitioning data into slices can significantly improve query execution because the Vertica optimizer can isolate the relevant storage containers, and eliminate the rest
Solution: Users can now create a custom, hierarchical definition of partition structure
Benefits: Faster queries at Petabyte scale

Universal Unique Identifier (UUID) Data Type

Challenge: Storing UUID data as text strings is an inefficient use of space
Solution: Allow customers to store UUID columns as a more space-efficient data type
Benefits: Data stored more efficiently

Flattened Tables

Challenge: Many queries involve joins between a large fact table and multiple dimension tables, which increases query overhead and reduces performance
Solution: Flattened tables include columns that get their values by querying other tables, offering a flattened view of data stored in complex schemas
Benefits: Less overhead and faster query performance

Introducing Vertica 9

Analyze in the Right Place

Query Parquet Data from S3 Data Lake

Data Access Restriction with Security Realms

Integration with Apache Sentry

Machine Learning and Advanced Analytics

Convert Categorical Data to Numerical Data

Cross-Validate Machine Learning Models

Export Machine Learning Models

Freedom from Underlying Infrastructure

Available on Google Cloud Marketplace “Launcher”

Microsoft Power BI Certification

Cloud Provisioning with Management Console (MC)

Eon Mode Beta: Separation of Compute and Storage

Performance at Exabyte Scale

Hierarchical Partition Management

Universal Unique Identifier (UUID) Data Type

Flattened Tables

You may also like...

Resources

Search

News

Events

Archives

Sponsored By

InterSystems

MySQL/Oracle

Supporters

McObject

Raima

Scality

TIAA

Undo

Volt Active Data