Encrypt with Voltage to protect sensitive data within Vertica analytics platform
by Roberto Zicari ·
Big data is in the news for many reasons. Much is being written on how to capture, store, use and protect it. In that vein, I want to point out that great things are happening with Voltage SecureData in Big Data. Check out this integration blog post that originally appeared on the Vertica blog discussing the integration of Voltage SecureData, an encryption platform for data protection in Vertica 9.1, delivering an architecture optimized for cloud economics and dynamic workloads.
Vertica Version 9.1 introduces a new integration with Voltage SecureData. This feature lets you use encryption to protect sensitive data within your Vertica database. It works within your existing database schema and clients. You can even automate data decryption. This automation makes decryption transparent and removes the need to change your existing SQL queries.
What is Voltage SecureData?
Voltage (which, like Vertica, is part of Micro Focus, so you know they’re awesome) has developed SecureData to provide data encryption for systems that lack their own integrated encryption. You install SecureData as an appliance running on its own host. Applications query the appliance for keys to encrypt or decrypt the data. SecureData itself does not store the data in any way.
One key feature of SecureData is Format-Preserving Encryption (FPE). When you encrypt data using an FPE format, encrypted values have the same format as the unencrypted values. For example, you can define an FPE format for Social Security Numbers (SSN) which are often used in the United States to identify individuals. These numbers are in the format nnn-nn–nnnn. Then you send SecureData a value such as 123-45-6789 to encrypt using the SSN format. It replies with an encrypted value such as 453-32-2349. This encrypted value resembles the unencrypted value.
Secure Data even lets you configure an FPE format to leave parts of the value unencrypted. For example, some organizations use the last four digits of the SSN when asking someone to verify their identity. You can create an FPE SSN format that leaves the last four digits of the SSN unencrypted, so the SSN value 123-45-5678 could be encrypted as 453-32-5678.
The beauty of the FPE format is that you do not need to change the way your application handles the encrypted data. For Vertica, this means you do not need to change your table schemas to store encrypted values. Other encryption solutions usually return encrypted values as binary blobs. Using them requires you to change your table schemas and the way you display data.
Another great feature of FPE is that you can have anyone who is not authorized to decrypt the data just see the encrypted values instead. As these values are in the same format as the original value, your reports and client applications can transparently handle the encrypted values. As you’ll see, you can use this feature to set up access policies in Vertica that automate the decryption of data for authorized users, and leave it encrypted for unauthorized users. Your client application never needs to worry about decrypting the data. Vertica and SecureData handle it behind the scenes.
See the Voltage SecureData home page for more information about SecureData.
Setting Up the SecureData Integration
The SecureData integration feature consists of several functions in a User-Defined Extension (UDx) library. This library is pre-installed in Vertica. Before you can use these functions, you must take three steps:
- Verify that Vertica has the right root certificate authority (CA) to authenticate with SecureData.
- Grant users access to the SecureData library.
- Configure parameters to tell Vertica how to access SecureData.
Verifying Vertica can Authenticate with SecureData
Vertica must have a copy of the root certificate authority (CA) and any intermediate authority used to sign your appliance’s encryption certificate to authenticate with your SecureData appliance. Vertica ships with many standard root certificates. If your appliance’s certificate was signed by a well-known certificate authority, it’s likely Vertica already has correct CA. If you used a self-signed root CA to create your appliance’s certificate or you have issues getting Vertica to authenticate with SecureData, you must add the correct CA to Vertica. See the Verifying the Vertica Server’s Access to the SecureData CA Certificate topic in the Vertica documentation for instructions.
Granting Users Access to the SecureData Functions
By default, only database superusers have access to the SecureData functions. You must grant non-privileged users access to the functions before they can call them. The best practice is to grant a role access to these functions. Then grant the role to the users who need to encrypt and decrypt data.
Here’s an example of creating a role and granting it access to the functions to encrypt and decrypt data. The example then grants the role the user named Alice and sets it as her default:
Configure Parameters for SecureData Access
You must set several parameters that the SecureData integration functions need to access the SecureData appliance. Which parameters you need to set depends on whether your SecureData Appliance uses LDAP or shared secret (a password set in SecureData) authentication.
The parameters you can set are:
policy_url: the URL of the SecureData policy file. Vertica uses this file to determine the address of the SecureData Appliance as well as many of its settings.
username: the user credentials for connecting to SecureData. This value is an LDAP username and is only used when using LDAP or LDAP and shared secret authentication.
identity: the SecureData identity to use. See the SecureData Administrator Guide for more information about identities in SecureData. This value takes the form of an email address. When SecureData uses LDAP authentication, your LDAP account must have access to this identity.
shared_secret: a password set in SecureData. Only set this parameter when SecureData uses shared secret or LDAP and shared secret authentication.
password: the LDAP password to use. Only set this parameter if SecureData is using LDAP authentication.
allow_short_fpe and enable_file_cache are two optional values. See Configuring Access to SecureData in the Vertica documentation for details.
You can configure these parameters two ways: through session variables or by creating a configuration file stored in Vertica’s distributed file system. The best way to configure these values is to use have the database administrator create a central configuration file for the values that must be the same for all users. Individual users then set the remaining values for themselves.
The policy_url, allow_short_fpe, and enable_file_cache parameters must always be the same for all users. The database administrator can set these values in a central configuration file named /voltagesecure/conf. The database administrator uses the VoltageSecureConfigure function to create this file. The following example demonstrates creating this file by setting the policy_url parameter (and using the default values for
allow_short_fpe and enable_file_cache):
With the policy_url set, users set the remaining parameters in session variables using the ALTER SESSION statement. They set these parameters for the SecureData library, named voltagesecurelib. For example, here’s how the user Alice could set these parameters to be able to use SecureData:
Choosing an FPE Format for Encrypting and Decrypting Data
One key thing to keep in mind about using SecureData is the format you want to use when encrypting and decrypting data.
As mentioned earlier, you can tell SecureData how it should encrypt specific types of data (such as Social Security Numbers) so that their format is maintained. The formats also ensure that SecureData always returns valid encrypted values for the format. For example, a date such as 07/04/2018 has restrictions on the values that can occur in each part of the date. The month value cannot be over 12. When you set up a format, you define the values that SecureData can return in an encrypted value.
You can also choose to use the FPE format named auto, which has SecureData try to figure out on its own how to encrypt the value. Auto is not a good choice for values that have format restrictions such as dates or Social Security Numbers. For example, if you use the auto FPE format to encrypt a date, it is likely SecureData will return random values for the date fields, rather than restricting them to values that are valid for dates. For example, it may return 30 as the month portion of the date.
Finally, you must always use the same FPE format when encrypting data in a table column. If you do not, you will lose your ability to reliably decrypt your data. For example, suppose you load one batch of data into a table column using an SSN format that encrypts the entire value. Later, you load another batch of data into the same column using an FPE format that does not encrypt the last four digits of the SSN value. Now you have the problem that you cannot determine which values in the column were encrypted which FPE format. If you decrypt a value with the wrong FPE format, SecureData does not return an error. Instead, it gives you a value that looks like an SSN value but is not correctly decrypted. SecureData just operates on the values you pass it based on the FPE format you give it. There is no way for it to tell which FPE you used to originally encrypt a value.
You must also use the same format to decrypt a value as you used to encrypt it.
Encrypting and Decrypting Data
OK, now we’re ready to encrypt and decrypt data! To encrypt a value, call the VoltageSecureProtect function with the VARCHAR value you want to encrypt, the format parameter set to the FPE format to use, and config_dfs_file set to the path of the central configuration file. The Vertica SecureData integration functions only encrypt and decrypt VARCHAR values. You must cast any other values (such as numeric or date values) to VARCHARs before passing them to VoltageSecureProtect.
Note that changing the FPE format completely changes the encrypted value.
To decrypt a value, call VoltageSecureAccess with the encrypted value, the FPE format, and the configuration file:
Directly calling these functions isn’t very useful. Here’s an example of encrypting a column while loading data from a comma-separated file:
And here’s an example of using VoltageSecureAccess in a query to decrypt the ssn column in the previous example:
Using Access Policies to Automate Decryption
You can create an access policy for a table column that automates the decryption of encrypted values for specific users. This feature lets you implement decryption without having to change the queries a client uses.
For example, suppose you wanted to create an access policy on the SSN column of the customers table from the previous examples that lets users with the see_ssn role view the unencrypted values. Then you could follow these steps to create a role, grant it to the user named Alice, and create the access policy to decrypt the column value.
Once you create the access policy, you can test it by switching to a user that has the role you created. This example switches to the user Alice and queries the customers table twice: first without having the see_ssn role set and then with it set:
Where to Go from Here
That covers the basics of using the new Voltage SecureData integration feature in Vertica. There are more things you can do with the SecureData integration functions. For example, you can create access policies that mask unencrypted semi-sensitive data from users who should not have full access to it. For more information about the new Voltage SecureData integration, see the Vertica documentation’s Integrating with Voltage SecureData section.
Sponsored by Vertica