Access Overture's Places Data

GeoPune

GeoPune is a meetup group about maps. We host talks on various topics related to tech, maps and geospatial data.

Published Jul 27, 2023

Overture Maps Foundation released the alpha version of their open data, and I couldn’t wait to get my hands on it. When I went to their download page I saw that they have released their data not in any of the standard GIS formats, nor in OSM’s PBF format, but in Apache parquet format.

The kind folks at Overture have a writeup on how you can access it here, but it is far too technical for someone who doesn’t have knowledge of the parquet format, and hence it is not too easy for us GIS folk to download it.

And that's why let us go step by step, and figure out how to access and download this data.

Before we get started, lets understand a couple of things:

Apache Parquet (https://overturemaps.org/download/) is a highly compressed data format, which stores data by columns, which makes it light in size, and quick for parsing. However this also means that you can’t open these files in a text editor, or a standard GIS software like QGIS. It is meant for programmatic access by Big data Applications.
DuckDB (https://duckdb.org/) is a serverless, in-process Database for Online Analytical Processing (OLAP), or as they rightly claim on their website, duckdb has ‘All the benefits of a database, none of the hassle’. If you look at the documentation, you will see that it can access parquet files, from amongst many other formats, and hence we will use duckdb to access this data. Additionally, it also has a spatial extension, which we will use for spatial querying and writing the data to GeoJSON format.

Here are the steps to get the POIs for a small area on to our local system.

The first step would be to download and install duckdb onto our system. Follow the steps given on this page(https://duckdb.org/docs/installation/) to download and install the duckdb cli executable on your system
You should now run duckdb, by running a command somewhat like `duckdb’ on the commandline, when you are in the same folder where it was extracted or installed.
We will need two extensions to make our lives easier. One is used when you are streaming files from a remote store like Amazon’s S3, and the other is used to enable spatial functionality, which we will use to query the data, and save it in a spatial format. You can do so by running the following commands within duckdb:

INSTALL httpfs;

INSTALL spatial;

Now you are ready to access data.
Go to a site like http://bboxfinder.com/ and draw a polygon for your Area of Interest. This will show you the bbox at the bottom of the page. In my case I have selected an area around Pune

Lets load the required extensions, and also set the AWS zone that we will be using. This can be done by the following commands in duckdb:

LOAD spatial;

LOAD httpfs;

SET s3_region='us-west-2';

Before we query the data, we need to understand the columns; To do this, we can run a command like:

Describe Select * from read_parquet('s3://overturemaps-us-west-2/release/2023-07-26-alpha.0/theme=places/type=*/*', filename=true, hive_partitioning=1);

This will show us the columns within the data, like this:

We only want some of the columns from the data, and some of them are nested structures so we will have to convert them to see them.Lets query for a couple of records, and confirm that we get the right data.

Select

id,

JSON(names) as names,

JSON(categories) as categories,

JSON(brand) as brand,

JSON(addresses) as addresses,

ST_GeomFromWKB(geometry) as geom

from read_parquet('s3://overturemaps-us-west-2/release/2023-07-26-alpha.0/theme=places/type=*/*', filename=true, hive_partitioning=1)

LIMIT 2;

This will get the data, for only two records, and convert the names, categories, brand and addresses column to JSON, so that we can then extract them.
Now I want to query and get only data for our bounding box, so let us query on the bbox column. We can do this by running the following Query:

Select

id,

Recommended by LinkedIn

Working as a Data Scientist/Analyst at Esri

Ratnam Swamy 2 years ago

GIS Location Intelligence Analysis For Business…

Dimitris Karakostis 6 years ago

Optimizing ArcGIS Dashboards: 250,000 Linear Features…

Karol Francis 5 years ago

JSON(names) as names,

JSON(categories) as categories,

JSON(brand) as brand,

JSON(addresses) as addresses,

ST_GeomFromWKB(geometry) as geom

from read_parquet('s3://overturemaps-us-west-2/release/2023-07-26-alpha.0/theme=places/type=*/*', filename=true, hive_partitioning=1)

where

bbox.minX > 73.77 and

bbox.maxX < 73.955 and

bbox.minY > 18.43 and

bbox.maxY < 18.61

LIMIT 2;

This will show you two records which match our query parameters.
The last step is to make a query which will write this data to a GeoJSON File. This can be done by the following query:

COPY (

Select

id,

JSON(names) as names,

JSON(categories) as categories,

JSON(brand) as brand,

JSON(addresses) as addresses,

ST_GeomFromWKB(geometry) as geom

from read_parquet('s3://overturemaps-us-west-2/release/2023-07-26-alpha.0/theme=places/type=*/*', filename=true, hive_partitioning=1)

where

bbox.minX > 73.77 and

bbox.maxX < 73.955 and

bbox.minY > 18.43 and

bbox.maxY < 18.61

) TO 'poi_pune.geojson'

WITH (FORMAT GDAL, DRIVER 'GeoJSON');

Do note that this query might take some time to run; Depending on your configuration, and internet speeds, this might take a few hours to run as well.
Once the query has run, it will write this data to a GeoJSON file, and you can open it in a GIS software like QGIS.

Access Overture's Places Data

GeoPune

GeoPune is a meetup group about maps. We host talks on various topics related to tech, maps and geospatial data.

Recommended by LinkedIn

Others also viewed

Mastering Location Data: Geospatial Magic Meets Databricks Power

Using FME + AI for ArcGIS Feature Service metadata creation

Managing raster (Satellite Imagery) in DuckDB with the spatial extension (I)

Will working with open bus data help me keep my marbles?

What's Up Wednesday (5th March, 2025): FME 2025 is Released! Heartbeats, Compound Conditional Parameters, Database Format Updates, Neo4j.

Data Lake Tahoe

C#: LINQ Proximity Query Performance with the SQL Geography Data Type

7+ Google Fusion Tables Alternatives

Innovating with Data

Weather Data - Analysis

Explore content categories