Spatial Analytics vs. Spatial Search

Spatial Analytics vs. Spatial Search

Spatial Analytics vs Spatial Search

Random Thoughts for the Data Analytics Practitioner or Software Architect


Hello all. PlanetRisk has a massive amount of spatial data. Jim Stokes, Mark Dumas, and I have a lot of experience with spatial analytics (and Paul McQuillan always tries to get us to speak in understandable terms about it). As I continuously plot the future of PlanetRisk’s technology throughout my days meandering from one whiteboard to another, I often find myself philosophizing about spatial analytics vs spatial search. In the last few years I have found some misconceptions about both, or at least some varying perceptions of the meaning of “analytics” when it comes to spatial data (and often very Big spatial data). I have found it important to understand the distinction between spatial search and spatial analytics because it can, and possibly should, affect your strategy of what technologies you choose to fulfill your spatial analytics dreams.

First let me describe a basic spatial search use case. Let’s say I have a large quantity of point data, say tweets with geolocations (who doesn’t have a couple million of those lying around?). Let’s say I have a web-app that uses the mapping control de-jeur, like ESRI or Leaflet or OpenLayers, and I want users to be able to search for tweets, but only get tweets back if they are geotagged to the area on earth that they are zoomed in to on the map. In this case, the UI will grab the bounding box of the map control in coordinates somehow, put together a geojson or WKT geometry, and pass that to the backend for use as a filter. Then the backend uses something like an INTERSECTS function to only return stuff that geometrically intersects the geo bounding box. Subsequently, the points are returned and plotted on the map (or a sample of them due to volume), and the user can then look at them. Pretty simple right? This is possible in many open source technologies such as SOLR, ElasticSearch, and a slew of others. To get a little fancier, we could perform a rollup of content based on the spatial bounding box, or we can even automatically group the data within the bounding box into a static grid system and use each grid polygon as a filter, like ElasticSearch does with its Geohash Aggregations. Essentially, spatial search is simply a type of filter, a good old fashioned “where clause” if you will, but based on geometry. Spatial search is pretty easy to scale, because the filter geometry just needs to be passed to the cluster, and processed.

Spatial Search is fairly straightforward. Spatial Analytics…well… not so much.

Spatial analysis is what you mostly see in GIS systems. Very different than spatial search, but a lot of people think that if a technology “does spatial,” then it does everything spatial. Not true in my opinion. Moreover, doing descriptive analytics with a geospatial filter is still not really spatial analytics IMO. Here are some examples of spatial analytics use cases:

Let’s say we’re looking for a place to put a new store in a new state or province. Let’s say we think it should be near a major road, near a populated place, far away from any competitor stores, but close to restaurants. In this case we have points, lines, and polygon data in the form of thematic layers, and our desire is to get back something like a heat map that shows us “suitability” as a score distribution, and let’s say we expect to deliver this type of analysis on the fly. Whoa….this is heavy right? Here’s another more basic example. Imagine I have a layer of roads, and a layer of wetlands areas. Each layer has thousands of features (shapes) each. I want a new set of geometries (shapes) that constitute the intersection of anywhere any road buffer crosses a wetland area. I’m not looking for a wetland feature that touches a road, I want a new set of geometries of exactly where the overlap is occurring across N shapes (features) in each layer.

Ok, so you can hopefully see the difference in use cases, but let me summarize some fundamental differences I see in how these two puppies work:

-       Spatial search largely has no need for the concept of a “Layer,” but layers are essential to spatial analysis (even if they are defined on the fly with a query)

-       Spatial analysis uses complex shapes, and distance matrices for multi layer analysis (sometimes even vector to distance-raster operations are performed for summing cell values along a classification scheme). Spatial search might be able to give you some of the material for this, but it doesn’t solve the whole problem.

-       Spatial analysis often uses intersectION, while spatial search just needs intersectS. The subtlety here is that an intersectION would return you the actual polygon for the inside of a two circle Venn diagram, while spatial search’s intersectS just says “yup, they overlap! Want one?”

There are more, but I don’t want to spend all day writing this article :-)

So, since the goal of this article is to increase awareness so leaders can make the right technology choices, I recommend asking these questions of your spatial use case:

-       Do you need the concept of layers? Why? Are you sure you know what a layer is?

-       Do you have polygon and line data? And what do you want to do with them?

1.     Get back data within a buffered line?

2.     Get back data inside a polygon?

3.     Or do you want to intersect your polygons with your buffered lines and get back the actual shape where they overlapped?

4.     BTW, geohashing only works on points…so do you think you’re going to use geohashing as a summary layer if you have other than point data? How?

-       Look ahead…if you’re just doing spatial search now, do you think spatial analysis use cases will eventually present themselves?

-       Do you need instant feedback on a query, or are you just trying to do batch analytics to get some insights?


You get the point… be inquisitive, and be informed about what you ponder…


Now, of the technologies you think you might use, try asking these questions about them:

-       Does it allow on the fly buffering of geometries? (buffering meaning turning a point or line into a polygon based on a distance)

-       Does it allow distance functions?

-       Does it allow spatial aggregate functions (i.e. convex hull, union)

-       Does it do intersectION or just intersectS?

-       Do you need raster data type support? (if yes...take a deep breath, you're about to enter another world)

-       Does it have the concept of a layer for doing analytics? Not just a UI “layer” but an actual concept of a thematic spatial layer?

-       Can it accomplish a layer at query time?

-       . . . think really hard for more


So based on all this, here are my personal recommendations for technologies that support spatial search and spatial analytics (and of course this is subject to all problems inherent in human epistemology, including inter-subjectivity). I only list things I’ve rolled real solutions with, not toys I ran a hello world script on.

-       For spatial analysis

1.     Oracle Spatial and Graph. Sorry to all the open sourcers out there (I’m one of them too), but IMO nothing can touch Oracle Spatial, especially if it’s clustered on Real Application Clusters (RAC), because then you solve the big data and the spatial analytics problem in one shot. Layers are just queries. Yep, it’s expensive.

2.     PostgreSQL/PostGIS. This puppy screams with spatial analytics, but it’s too bad it doesn’t really have a distributed query capability like Oracle RAC. If we had more bandwidth at PlanetRisk, we’d be working on this in the open right now.

3.     ESRI ArcGIS Server suite. ESRI is fundamentally a GIS, and is insanely awesome at spatial analytics and visualization, and you can do a lot of it without writing any code at all.

4.     SQL Server Spatial. Not as capable as Oracle and Postgres, but it has a lot of good features for spatial analytics.

-       Spatial Search

1.     Oracle Spatial, PostgreSQL, ESRI, SQL Server are good hear too, although search is kind of by-layer in ESRI (at least the last time I used it)

2.     SOLR and SOLR Cloud. Great spatial search capability. I once had a 40 node cluster handling spatial search over 40+ million records for a few hundred users and it worked pretty well. A few stability problems, but this was a few years ago when SOLR Cloud was fairly new.

3.     ElasticSearch. Has great spatial search, and has a Gem of a feature with geohash aggregations for doing rollups of categories or statistics of point data. I have used ElasticSearch at very large scale…billions of shapes.

4.     There are many others claiming to do spatial, but I haven’t used them for anything real.


Hopefully you learned that spatial search and spatial analysis are different (or you could disagree of course), and that you might want to think really hard about what your use case truly demands, and will demand in the future. And let me know if you disagree with anything here.

Also, to learn more about what PlanetRisk can do for you in the spatial analytics and search realms just ping me: we know our spatial stuff.


./MG

Have you worked in strictly RAC for Oracle or have you been able to work with the Hadoop or NoSQL platforms of Oracle Spatial and Graph. If you have, have you noticed a difference/benefit to any of them? Do you see a benefit in staying within the RDBMS world for geospatial analytics?

Like
Reply

Not bad for an old NCO. LOL

Like
Reply

Another good article, Mark. I think for spatial analytics you hit the key spatial concepts (intersects, disjoint, within and contains). API's should support that as well as the UI. Location "qualifiers" are often missing from raw and processed data which makes for poorly composed data. Example qualifiers might be "command post" vs "residence". I'm a bit off your topic but I think most data engineers would read this article and appreciate what data structures are necessary to support spatial search and analytics.

Like
Reply

To view or add a comment, sign in

More articles by Mark Giaconia

Others also viewed

Explore content categories