In part 1 of this blog series, we looked at how Snowflake supports the GEOGRAPHY geospatial data type, which works with the earth as an ellipsoid, measuring distances over a curvature and plotting objects using the latest World Geodetic System, WGS84. We also explained that when you look at the flight path of an airplane on a two-dimensional plane, it doesn’t look like the plane is flying in a straight line due to the curvature of the flight path relative to the two-dimensional map view you’re looking at it from. But what if your geospatial analysis needs that two-dimensional representation? What if we don’t want to represent a portion of the earth as an ellipsoid when performing our geospatial calculations? Enter the GEOMETRY data type, which was announced as a private preview feature at our 2022 Snowflake Summit.
GEOMETRY basics (aka, when a map is more than a map)
If GEOGRAPHY is a globe sitting on our teacher’s desk, then GEOMETRY is the book of maps you kept in your car before the days of GPS and cell phones. When using that book to figure out directions, you probably never put an ounce of thought into the fact that a map from the book you’re looking at could look different depending on what spatial reference system it was built from. “Wait a minute,” you say, “isn’t a map just a map?” Consider the two maps below:
The one on the left may look “normal” to you, whereas the one on the right may look distorted, almost as if someone took the image of the map on the left and stretched the horizontal width without changing the height. But look more closely—do you notice in the map on the left how Alaska curves northward and how the Canadian islands toward the Arctic Ocean point straight north? And how Alaska points due west and the Canadian islands spread more eastward in the map on the right? One map isn’t a distortion of the other; rather, they are each using different mathematical projections to transform the curved earth into a two-dimensional plane.
Measurements and relationships would work differently on each map as well. East-west measurements would certainly be longer in the map on the right, and notice how the Alaskan islands would be roughly on the same longitude of southern Canada, whereas they would be more aligned with northern Canada in the map on the left? Many things about these two maps would be different if we performed the same geospatial analysis on each, but one is not more correct than the other. Instead, your use case will determine which map you should use.
Believe it or not, there are many different “maps” out there and they are cataloged by something called a Spatial Reference System. Each unique projection of the earth is given an identifier called a Spatial Reference Identifier, or SRID. An SRID will often have several unique characteristics about it:
- A coordinate system, which can be different from the standard longitude and latitude
- A datum, which binds points on the spherical earth to the coordinate system
- A projection, which transforms all coordinates from a sphere to a plane
- A unit of measure for that coordinate system, which can be something other than meters
Taking our maps above and adding two more, we can see the visual application of different SRIDs below (click the links to get more information about each SRID):
But how is this useful in practice in geospatial analysis? To answer that, we have to consider some of the specific reasons for using the GEOMETRY data type over GEOGRAPHY. For example, when focusing our analysis on a smaller, more local area, we might not want to use a longitude- and latitude-based coordinate system. Or we might need a different unit of measurement. Think about the advancements in sports analytics: if we measure where a ball is kicked to on a football/soccer field, or where a tennis ball is hit to on a tennis court, or where a basketball is shot from on a basketball court, it wouldn’t make sense to plot those locations as longitude and latitude coordinates, and we may need a different unit of measure than meters depending on the precision we need. Or think about foot traffic in a shopping mall, plant crops in a field, energy management within a facility, or reforestation efforts—all of these deal with understanding spatial measurements and relationships on a scale that is more focused than “the entire earth” and thus can benefit from the use of GEOMETRY over GEOGRAPHY.
Orientation is also important. There are differences between true north, grid north, and magnetic north, and each will be applicable in different situations, which is one reason why you see some of the maps above “lean” a little in one direction or another. All of these examples add up to the reasons why we need GEOMETRY in addition to GEOGRAPHY.
Examples of how GEOMETRY differs from GEOGRAPHY
Now let’s dig into how GEOMETRY works in Snowflake, and how it differs from GEOGRAPHY. Let’s examine an image from part 1 of the blog series:
The SQL statement that produces the image above is the following:
SELECT * FROM WORLD_COUNTRIES_GEOM
WHERE ST_INTERSECTS(BORDER, TO_GEOMETRY('LINESTRING(13.4814 52.5015, -121.8212 36.8252)'));
There are two noteworthy items in this SQL statement:
- TO_GEOMETRY is a function that is unique to the GEOMETRY data type. Like TO_GEOGRAPHY, it converts an input from a string to a GEOMETRY data type.
- ST_INTERSECTS is the same function we saw with the GEOGRAPHY data type and it performs the same relationship comparison, regardless of which geospatial data type you use. Many geospatial functions can work interchangeably with GEOGRAPHY and GEOMETRY in Snowflake and do not need any special naming distinction.
As mentioned above, the introduction of GEOMETRY means we now have to care about which SRID our data is using. In the above query, we can use the ST_SRID function to extract the SRID for a given geospatial column, regardless of which type it is. GEOGRAPHY will always be SRID 4326, whereas a GEOMETRY column can vary based on the data set. In this case, we have an OSM_BAY_AREA table that has a GEOGRAPHY column and a GEOMETRY column for the same object. The SRID for the GEOMETRY column is 3493.
Always SET the SRID for Undefined Objects
But the introduction of alternative SRID support means we always need to be cognizant of which SRID a given function is using, because at this point in time, Snowflake does not yet support the ability to convert from one SRID to another. So in the above example, the query on lines 17-18 will fail because the POLYGON string does not have an SRID defined for it. Snowflake assumes an SRID of 0 in this instance, and because the GEOM column is in SRID 3493, Snowflake can’t resolve the function and will issue an error.
We can resolve this error using either of the two approaches in lines 21-22 or 25-26. The first approach in lines 21-22 uses the EWKT syntax to define the SRID of the POLYGON right in the string. The second approach in lines 25-26 uses ST_SETSRID to set the SRID of the TO_GEOMETRY output to 3493 using an additional argument.
The impact on measurements
Shifting to the measurement functions ST_AREA and ST_PERIMETER, which again can be used interchangeably with GEOGRAPHY and GEOMETRY, demonstrates how measurements can produce slightly different results between the two data types, all other things being equal. Both SRID 4326 and SRID 3493 use meters as their unit of measure, so we can see how the area and perimeter measurements of Santa Clara County California in the query above produce slightly different results when calculating measurements over a curvature (GEOGRAPHY) versus an SRID 3493 plane (GEOMETRY).
Working with different coordinate systems
The conversation shifts from similarities to vast differences when considering the coordinate system. Using the ST_YMIN and ST_YMAX functions in the query above, we can extract examples of the coordinate systems used in each SRID. As you can see above, the GEOGRAPHY column uses standard longitude and latitude as its coordinate system, but SRID 3493 (GEOMETRY) uses a very different coordinate system, with much larger numbers. This demonstrates how significantly different the coordinate systems can be between SRIDs and how using GEOGRAPHY isn’t appropriate for all geospatial use cases.
Respecting coordinate systems in constructors
The coordinate system defined in a GEOMETRY column is also respected by constructor functions, as shown above with ST_MAKEPOLYGON. In the results above, you can see how the GEOMETRY_POLYGON created for the same building as the GEOGRAPHY_POLYGON uses the much larger coordinate system values in the polygon definition instead of longitude and latitude. All constructor and transform functions will always respect the coordinate system used in the SRID of a GEOMETRY column.
Snowflake’s commitment to geospatial support
Over the last year, Snowflake has made significant investments to support deeper geospatial analysis in the Data Cloud, beginning with GEOGRAPHY support and continuing with the introduction at our June 2022 Snowflake Summit of GEOMETRY support, which we’re happy to announce is now in public preview. We will continue to enhance geospatial support in the coming months by simplifying data ingestion by supporting more data and storage formats, adding new functions, adding conversion from one SRID to another, and enabling spatial intelligence by providing visualization capabilities.
Hopefully these two blog posts have given you a good overview of what geospatial data is, how it works, and how you can make use of it in Snowflake. Now that GEOGRAPHY support is generally available and GEOMETRY support is in public preview, you are free to go test these great new features for yourself!