Vertica Integration with Talend: Connection Guide
For Vertica 8.0.0
HPE Vertica connection guides provide basic information about setting up connections to Vertica from software that our technology partners create. These documents provide guidance using one specific version of Vertica and one specific version of the third-party vendor’s software.
Other versions of the third-party product may work with Vertica. However, Hewlett Packard Enterprise may not have tested these other versions.
This document provides guidance using the latest versions of Vertica and Talend as of October, 2016.
Talend is an open source ETL tool that you can use to perform extract, transform, and load operations on your data. You can load the transformed data into many databases and file formats.
Hewlett Packard Enterprise has tested Vertica Server 8.0.0 and the Vertica JDBC Driver 7.0.x with Talend 6.2.1 (Windows Server 2012 r2).
To download the latest version of Talend:
- Go to https://www.talend.com/download.
- Under Big Data Integration, click Try Now.
- Follow the instructions.
Talend uses JDBC to connect to your Vertica database. The Talend installation installs the supported Vertica JDBC driver, so you do not need to install the driver separately.
When you install Talend, the installer extracts these application files:
To start Talend, run the application appropriate to your Windows operating system.
The following topics describe how to create a project that connects to Vertica, and create a job that when executed, performs the desired extraction, transformation, and load.
A project in Talend is a top-level structure that contains objects, job designs, and metadata. Create a separate Talend project for each business purpose.
The following steps show how to create a project:
- Launch the appropriate Talend application (32-bit or 64-bit).
- On the dialog box that opens, select Create a new project and enter a name for your project.
- Click Create. The main Talend window opens.
- On the Repository tab, expand Metadata.
- Right-click and select Db Connections > Create connection.
The first Database Connection window appears.
- Enter information about the database connection from which you want to extract data. In this example, the source database name is DB_SRC.
- Click Next.
The second Database Connection window opens.
- Enter credentials for the Vertica database you want to extract the data from, including the connection information.
- Click Check to have Talend confirm that the connection is valid.
- After the Check Connection dialog box opens to tell you that the connection was successful, click OK.
If the connection is not successful, check the credentials that you entered.
- Click Finish.
You have created the data source.
- To define a target for the extracted and tranformed data, repeat steps 1–7 for the target database or file format.
After you define the source and target connections, they appear under Repository > Metadata > Db Connections with the names you designated (DB_SRC and DB_TGT). Talend appends “0.1” to their names.
- To select the data from your source database that you want to transform, right-click DB_SRC and selectRetrieve Schema.
- Talend displays the schema and tables that you can use.
- Select the tables to extract from your source Vertica database. The example shows BULKTEST and Char4k_Table selected.
A job contains the details that Talend needs to execute the workflow that you define for the ETL operations.
To create a job to perform these operations, follow these steps:
- On the Repository tab, right-click Job Designs and select Create job.
The New Job window appears.
- Next to Name, enter the name of the job.
- Click Finish.
- Under Repository > Metadata > Db Connections > DB_SRC 0.1 > Table schemas, select the desired table, in this case, Char4k_Table.
- To add the table to the Job Designer, drag the table onto the canvas.
The Job Designer provides a high-level view of the workflow components and connections.
- Select the table to extract data from, in this case, Char4k_Table. The Components window appears.
- The Components window lists the types of components you can create in the Job Designer. Select tVerticaInput and click OK to add the tVerticaInput component to the Job Designer.
- To view the component properties, double-click the component in the Job Designer.
The properties include source database connection information and the SQL query that Talend will use to extract the data.
- In the properties window, specify the source data transformation.
- Similarly for the target, in the Repository, select the target table (Char_4kTable) from Target DB (DB_TGT) and drag it into the Job Designer.
- In the Components window, select the tVerticaOutput component and click OK.
The tVerticaOutput component appears in the Job Designer.
- Double-click the tVerticaOutput component.
- In the properties window, configure the transformation for the target data.
- To map the two components, right-click the tVerticaInput object and select Row > Main.
- Select the tVerticaOutput component and left-click.
The line snaps to the tVerticaOutput component:
- To view the tVerticaOutput schema, double-click the tVerticaOutput component in the Job Designer.
- On the tVerticaOutput properties window, to the right of Edit schema, click the box with the three dots in it.
- In the window that opens, select View schema.
- The schemas for the input and output components display.
In this example, the input and output schemas are identical:
Now you can run the ETL job. To start the job, click the Run icon (at the top of the window) or press F6.
When the job completes, Talend displays a message like this:
The following limitations apply to the Vertica integration with Talend.
At the time of this writing (January, 2017), Talend does not work correctly with the Vertica JDBC 7.1.x driver that it offers under the Vertica Version field of the Connection definition dialog box.
Choose one of the older versions of the Vertica JDBC driver when you define a connection. Vertica engineers tested the Vertica integration with Talend using the Vertica JDBC driver 7.0.x.
In a Vertica table, the maximum and minimum values for an INTEGER column are:
- Maximum: 922337203685477580
- Minimum: –9223372036854775807
When you try to load these values, Talend loads the values as 1 and –1, respectively.
To work around this problem, before loading the data, change the column’s data type in Talend to LONG. Then Talend saves the correct values.
For more information, see the following links:
- Talend Open Studio
- Vertica Community
- Vertica Documentation
- Big Data and Analytics Community
- Hands on Vertica
Sponsored by HPE Vertica