Continuing our quest for the perfect big data analytics platform
This blog post was authored by Steve Sarsfield.
The Vertica development team has just released Version 9.0. With every major release it gives me time to not only look back and see what was developed this cycle, but a look at the entire timeline. I joined the Vertica team about 4 years ago in the midst of the “Dragline” release. For those of you who may not be aware, Vertica has internal names for the major releases that are named after construction equipment and ordered alphabetically. I’ve personally seen Crane, Dragline, Excavator, Front loader and now version 9, the Grader release. Throughout each of these releases, amazing technology has been built.
As a level-set, I’m sure most readers who find this blog are familiar with Vertica. However, the foundation of this database was built on some (at the time) ground-breaking technology. The combination of MPP architecture, columnar storage and very efficient compression was unheard of in the beginning, around 2005 when the company started. In fact, it was rare in those days to even have a Terabyte of data and Vertica sold more to companies with extreme use cases. Times have changed significantly and data warehouses with 100 TB or even a petabyte are more common.
Over the years that I have been associated with Vertica, the concentration of development has been basically on three things:
• Analyzing Data in the Right Place – when you have big data, you don’t want to have to move it around a lot, so analyzing it where it sits is a key capability. Granted, you will get some performance benefits when you move the data into our format and into our storage layer. However, you shouldn’t have to move it to analyze it. You can keep your hot data hot in Vertica native formats and your cold data cold in other storage mechanisms like Hadoop and Amazon S3. It supports both analytics that need to be complete right now, and analytics that can take its time.
• Freedom from Underlying Infrastructure – which is somewhat related to the above. I still hear people mistakenly associate us with a data warehouse appliance, but our philosophy for on- premises implementations has always been to be able to use any hardware. Now more than ever, companies want to use a combination of on-premises, Hadoop and the cloud for performing analytics. Vertica has made great strides in supporting different deployment environments and bringing analytics to the data.
• In-database Machine Learning – today’s analysts have questions that reach beyond standard SQL. Sure, there is advanced analytics like time series, where SQL functions are built into the database for this type of data. There’s geospatial, where you can do some analytics based on lat/long. These days, analysts need predictive analytics, too. They can do this in Vertica via the use of R, Python and most recently in-database machine learning. I believe there is no other MPP database that offers ML algorithms that can fully utilize a cluster the way we do. SQL analytics and ML algorithms can run side-by-side to offer very advance analytics to your crew.
I should also mention the performance enhancement that come with each and every release. Vertica gets faster and faster as our engineers shave off microseconds in a code base that already works well. In doing so, the ultimate impact of microsecond-shaving multiplies when you’re dealing with terabytes or petabytes of data. It’s remarkable to see the benchmarks and continuing performance improvement of Vertica throughout the years.
So welcome version 9 of the Vertica platform, also known as the Grader release. If you want to see exactly what’s in it, check out our documentation and release notes. A webinar is also available if you want to listen.
Sponsored by Vertica