Regional Open Data – an emerging model

David Doyle

Published Oct 6, 2017

In October 2016, I attended my first ever open data conference (the Sunlight Foundation “TransparencyCamp” in Cleveland, OH) and was exposed to a huge array of topics and insights that had my mind buzzing. It wasn’t until the end-of-conference gathering, on a lake-front barge where we could enjoy fabulous views of the city skyline on a balmy Fall evening, that I heard about a relatively new open data model that became my key takeaway from the whole event.

I struck up a conversation with a fellow attendee who asked me if I had ever heard of the Western Pennsylvania Regional Data Center (WPRDC). As my mouth was full of pizza at the time, I shook my head to signify that I had not – and so began my introduction to regional (open) data centers, which have fascinated me ever since.

What do I mean by “regional” open data?

If you listen closely to public officials speak about certain challenging issues, such as homelessness or transit, you will often hear them refer to these issues as being "regional" in nature and therefore requiring a "regional" approach when it comes to attempting to solve those problems or improve certain situations. Yet, our local government open data is almost always siloed by City, County, or State with little or no formalized sharing of data between our respective organizations. It is true that there are some sites that consolidate data from various governmental sources, but those are typically done at a national level and usually without the direct involvement of the data owners.

Policy diffusion

Added to that, most local governments simply don’t have the resources to spin up an open data portal of their own. In a 2015 article, an analysis of the local government landscape within the US provides some insights into why the notion of a more regional approach towards releasing local government open data might be gaining traction:

“It’s a practical impossibility for most municipal governments to publish open data. There are 39,044 local governments in the U.S., but those with open data programs number in the dozens. To pick an quasi-arbitrary cap, for governments with fewer than perhaps 100,000 citizens—or 98.7% of subcounty municipal governments—it’s beyond their capabilities. They lack in-house technical expertise. They lack a budget for specialized staff, a data repository, ETL solutions, etc. They’re saddled with lousy, specialized software that has no ability to export data in an open format. Worst of all, they lack clear business cases for why they should open their data holdings.”

Taking that analysis into account (and adjusting for the increase in open data portals coming onstream in the two years since this article was written), it is easy to see why smaller cities and counties might look to adopt a model that would allow them to publish open data but reduce greatly the burden on them to enable that work to happen. This is the opportunity gap that these early regional open data portals appear to be attempting to fill.

If we look at how Regional Open Data portals been diffusing across the US, it appears that the majority that exist today follow a City/County partnership model. The aforementioned WPRDC is probably the best example, but there are other regional portals in existence such as data.indy.gov.

Why is the WPRDC model so interesting?

What was it about the WPRDC model that was so interesting to me, and how could it help us to start to think differently about the future structure of open data programs?

Resiliency

The first thing that immediately caught my attention was that this data center was being hosted and managed by a university, specifically the University of Pittsburgh’s Center for Social and Urban Research, in partnership with Allegheny County and the City of Pittsburgh. Typically, open data portals are “hosted” by the relevant local government themselves (typically using a hosted cloud service like the Socrata platform).

In the immediate aftermath of the 2016 US Presidential election, concerns about the resiliency of open government programs quickly came to the fore. Grassroots efforts such as “DataRefuge”, which aimed to archive vast quantities of federal open data, quickly emerged to act as a protection against the expected future removal of some open government data sources. Aside from the political ideologies at play here, it also raised questions about the reliability of sources of funding for open data programs.

The resiliency question is an intriguing one, especially when we think about the various forces that shape programs in a local government versus another public institution like a university. Like governments, universities and colleges are resilient entities and often have access to deep pools of funding - from grant funded programs (such as with the WPRDC), to major endowments and gifts that are targeted at specific outcomes. Having public universities host regional open data centers where they can harvest open data from multiple governmental agencies and jurisdictions could act as a buffer against changing political environments and volatility in local government budgeting. By hosting data from several local governments using this model, it could create the conditions whereby smaller local governments could more easily commit to publishing open data in the longer term than if they were attempting to stand up their own instance of an open data portal. Additionally, once a regional open data center has been established and has proven its value in terms of positively influencing policy and decision-making at the local and regional level, its resiliency will increase.

Data center versus open data portal

The second thing that intrigued me was that it referred to itself as a “data center”, and not an “open data portal”, even though that it what it essentially acts as. Personally, I quite like this description and believe that it better reflects their mission – creating a center where data from regional entities can be hosted. In terms of the Seattle open data program that I manage, I now almost always substitute the word “platform” in place of “portal” when referring to http://data.seattle.gov. Why? I view the work we are doing as creating a rich platform of data that is used to power all kinds of apps, services, research & analysis, advocacy, etc.; whereas in my mind the term “portal” merely conjures an image of a catalog like how we used to think about the Yahoo! portal over 10 or 15 years ago. These may be small subtleties in language, but I believe they are important considerations that shape how we think about our open data as a key pillar of “Government as a Platform”.

Other possible advantages

Having regional open data centers use a different hosting model, in this case a university, could be advantageous when we think about how to solve other common problems within the open data spectrum:

Driving awareness of open data: this is a major challenge for local governments; and such a model could help increase awareness of open data and open government with the next generation(s) as they enter third level education. Having regional open data centers hosted by universities could also lead to increased awareness among other cohorts of the college population (schools, programs, staff and students) that may not be traditionally aware of open data and its potential benefits for them, thereby enabling a broader network effect in terms of driving awareness.
Data literacy: like the awareness challenge, this could help advance the thinking and practices around data literacy efforts, something which local governments are typically not well equipped to manage effectively. This model could help expand and scale existing partnerships between university Information schools and local governments, especially for smaller local governments that may not have stood up their own open data portals.
Advanced opportunities for research: while we have many partnerships between cities and universities than involve open data - this model would mean that the universities have the data centrally stored to begin with, and would have more control over the collection and quality of the data. This could lead to easier identification of projects, accelerated research timelines, deeper insights being developed using broader platforms of data, and so on.
Data standards: Consuming data from several regional open data sources could aid with efforts to better standardize naming conventions, usage of metadata standards, etc. and make it easier to join and/or federate datasets.

But we already have an open data program…

Regional open data centers, like the WPRDC, appear to have been created in regions that didn't have an existing open data portal. However, could established open data programs, or regions that already have several major open data portals in existence, merge datasets into a regional open data model in the future? I believe so, and here are some top of mind considerations to be evaluated before embarking on such an integration.

A regional open data center could act as an extension of existing open data programs - e.g. local open data portals could feed into one regional open data center, as an extension of their existing publication operations. This would likely require some changes to existing open data policies and their subsequent policy implementation processes. While potentially tricky to manage this across multiple jurisdictions, it shouldn’t be insurmountable, especially if this effort was tied to specific regional goals (homelessness, transit, affordability, etc.)

The very important question of privacy protections would need serious consideration. Individual open data programs will have their own privacy review processes in place, but these will vary in scope and capability. It will be important to ensure that as more and more datasets from multiple local sources get added to the data center and the ease with which datasets can be joined is increased, the risks of de-identification and potential harm to individuals will likely grow.

Related to that will be the decisions on the integration of other external data sources If we look at the WPRDC, we see it its model is "Open to Government, nonprofit, and academic publishers". Decisions on where to enforce privacy and quality standards in that pipeline are crucial.

What’s next?

As existing open data programs across the country continue to expand and mature, and more and more new programs get initiated, it seems logical to expect some diffusion towards a more regional model. One interesting thing to watch will be how broadly these regions will be defined. As referenced earlier, currently we see a "region" being mostly a City/County integration. I believe in time we will see regional open data centers expanding their catchment areas (even beyond State boundaries – think “Cascadia”), and the types of agencies that will take advantage of these regional open data centers will also likely expand beyond what we currently think of as the main suppliers of open data.

Additionally, regional open data portals could act as an intermediate step for larger open data consolidation efforts, such as USAFacts.org (e.g. Local/County/Regional/State/National or Local/County/State/Regional/National).

I, for one, will be watching these development of these regional data centers with great interest.

Jason Hare 7y

This is what we are doing in Wake, Orange and Johnston counties here in North Carolina. Very happy to see my home town of Pasco, WA firmly in the central region.

Rebecca Matsco 8y

This important concept is gaining a foothold in Beaver County PA as our community responds to a rapidly-changing environment that demands data-driven decision-making in every sector - government, education, non-profit and business.

1 Reaction

Carol Yvonne Dyar-Eaton 8y

When will the National Information Exchange Model (NIEM) become prevalent in addressing the needs of intake for housing the homeless?

Patrick Atwater 8y

CC Varun Adibhatla check this out

Chip Battoe 8y

This is excellent Dave! Well written article, too. Thanks. As you know I'm working with a few different regions. I've been looking at federating all transit data for the Bay Area into a regional view, vs multiple views limited to smaller boundaries. This makes perfect sense as the roads are all connected. Obviously there are other examples. But I really like the idea of providing a means for those smaller or underfunded areas of government. You're right that right now it's focused on departments within a city/state/county. Some depts have much more funding than others. So we're essentially already doing this on a smaller scale. Looking forward to seeing where this goes and being a part of it.

See more comments

To view or add a comment, sign in

Regional Open Data – an emerging model

David Doyle

More articles by David Doyle

Others also viewed

Business leaders’ biggest gripes about big data

Big Data – Dumb Data?

How Big Data Will Affect Businesses

What You Need To Know About Big Data, But Were Too Afraid To Ask

Big Data: Seeing the Forest for the Trees in the Big Data World

How to make Big Data work for you

What Makes "Big Data" BIG?

What actually is Big Data?

A “Small” Definition of Big Data

My Top Questions to Ask Before Building Data Infrastructure

Explore content categories

More articles by David Doyle

Ask What You Can Do

Should open data policies fit into a tweet?

A quantum leap for open data?

Why the Digital Service Act of 2019 doesn’t go nearly far enough

What government can learn from Microsoft's cultural transformation

Reflections on my journey from the private to the public sector