Loading ZIP Code Boundaries from a Shapefile in Three Easy Steps!

Posted February 4, 2020 by Bryan HergerVertica Big Data Solution Architect at Micro Focus

Are you working on a project where you’d like to classify position data into a geographic region? The US Census Bureau provides shapefiles for ZIP code tracts that you can ingest into Vertica and use to identify the ZIP code for given coordinates. In this exercise, we’ll load the shapefile into Vertica, examine the contents, and try matching coordinates to a ZIP code.

Here is the shapefile source: https://www.census.gov/geographies/mapping-files/time-series/geo/carto-boundary-file.html

Obtain the file for ZIP Code Tabulation Areas (ZCTAs): cb_2018_us_zcta510_500k.zip [59 MB] (2018 file shown; you can use a newer year, if available)

I unpacked the shapefile download to /data1/bryan/geo/

Listing:

-rw-rw---- 1 bryan bryan 165 Apr 15 2019 cb_2018_us_zcta510_500k.prj
-rw-rw---- 1 bryan bryan 5 Apr 15 2019 cb_2018_us_zcta510_500k.cpg
-rw-rw---- 1 bryan bryan 265252 Apr 15 2019 cb_2018_us_zcta510_500k.shx
-rw-rw---- 1 bryan bryan 91072224 Apr 15 2019 cb_2018_us_zcta510_500k.shp
-rw-rw---- 1 bryan bryan 1756826 Apr 15 2019 cb_2018_us_zcta510_500k.dbf
-rwxrwxrwx 1 bryan bryan 34290 Apr 15 2019 cb_2018_us_zcta510_500k.shp.iso.xml
-rwxrwxrwx 1 bryan bryan 9052 Apr 15 2019 cb_2018_us_zcta510_500k.shp.ea.iso.xml

Be sure to chmod or chown files so they are readable by Vertica!

Follow the documentation: https://www.vertica.com/docs/Latest/HTML/Content/Authoring/AnalyzingData/Geospatial/GeospatialAnalytics/LoadingSpatialDataFromShapefiles.htm

Step 1: Use STV_ShpCreateTable to generate a CREATE TABLE statement.

dbadmin=> SELECT STV_ShpCreateTable ( USING PARAMETERS file = '/data1/bryan/geo/cb_2018_us_zcta510_500k.shp') OVER() AS zcta;
zcta
CREATE TABLE cb_2018_us_zcta510_500k(
gid IDENTITY(64) PRIMARY KEY,
ZCTA5CE10 VARCHAR(5),
AFFGEOID10 VARCHAR(14),
GEOID10 VARCHAR(5),
ALAND10 INT8,
AWATER10 INT8,
geom GEOMETRY(99582)
);
9 rows)

Step 2: Create the table.

dbadmin=> CREATE TABLE cb_2018_us_zcta510_500k(
dbadmin(> gid IDENTITY(64) PRIMARY KEY,
dbadmin(> ZCTA5CE10 VARCHAR(5),
dbadmin(> AFFGEOID10 VARCHAR(14),
dbadmin(> GEOID10 VARCHAR(5),
dbadmin(> ALAND10 INT8,
dbadmin(> AWATER10 INT8,
dbadmin(> geom GEOMETRY(99582)
dbadmin(> );
CREATE TABLE

Step 3: Load the shapefile.

dbadmin=> COPY cb_2018_us_zcta510_500k WITH SOURCE STV_ShpSource(file='/data1/bryan/geo/cb_2018_us_zcta510_500k.shp') PARSER STV_ShpParser();
Rows Loaded
33144
(1 row)

That was quick and easy! Let’s look at a record:

dbadmin=> select gid,ZCTA5CE10,AFFGEOID10,GEOID10,ALAND10,AWATER10,ST_AsText(geom) from cb_2018_us_zcta510_500k where zcta5ce10 = '11554' limit 5;
gid | ZCTA5CE10 | AFFGEOID10 | GEOID10 | ALAND10 | AWATER10 | ST_AsText (* geom field, truncated for publication) ST_AsText
------+-----------+----------------+---------+----------+----------+-----------------------------------------------------------------------------
1712 | 11554 | 8600000US11554 | 11554 | 16186098 | 61309 | POLYGON ((-73.587642 40.747566, -73.579641 40.749524, -73.578799 40.746505,
(1 row)

The key fields we’ll look at are the ZIP code, “ZCTA5CE10”, and the geometry of the ZIP code tract, “geom”.

Now let’s test the data by verifying the ZIP code of my ADS-B receiver, which is at longitude (X) -73.54, latitude (Y) 40.71:

dbadmin=> select gid,ZCTA5CE10,AFFGEOID10,GEOID10,ALAND10,AWATER10 from cb_2018_us_zcta510_500k where ST_Within(STV_GeometryPoint(-73.54,40.71),geom) = 1 limit 5;
gid | ZCTA5CE10 | AFFGEOID10 | GEOID10 | ALAND10 | AWATER10
------+-----------+----------------+---------+----------+----------
1712 | 11554 | 8600000US11554 | 11554 | 16186098 | 61309
(1 row)

ZIP 11554 is East Meadow, NY, which is correct! You could also JOIN a table mapping ZIP codes to place names such as this one: https://simplemaps.com/data/us-zips

Vertica offers full support for standard WGS84 geometry and geography types, geospatial functions, and import and export to WKB, WKT, and shapefiles as shown here.

All of these features are included in the free Vertica Community Edition. Try it out on your data!

Helpful Links:
Data load: https://www.vertica.com/docs/9.3.x/HTML/Content/Authoring/AnalyzingData/Geospatial/GeospatialAnalytics/LoadingSpatialDataFromShapefiles.htm
ST_Within: https://www.vertica.com/docs/9.3.x/HTML/Content/Authoring/SQLReferenceManual/Functions/Geospatial/ST_Within.htm
Defining a point to use in ST_Within:
https://www.vertica.com/docs/9.3.x/HTML/Content/Authoring/SQLReferenceManual/Functions/Geospatial/STV_GeometryPoint.htm

About the Author
Bryan Herger

Bryan Herger
Vertica Big Data Solution Architect at Micro Focus

My background is analytical chemistry and current interests and work are in finance, adtech, health information, and real-time streaming analytics.

Sponsored by Vertica

You may also like...