Skip to content

Geodata Collection Foundation Phase Report

Summary:This document reports on the progress of the BTAA-GIN Geodata Collection as of June 2026.

Q1 2025 - Q1 2026

Key accomplishments

  • 144 datasets archived
  • 4 municipal data provider partnerships established
  • A standardized workflow centered on open source formats
  • Expanded visibility through outreach, publications, and presentations
  • A tested foundation for scaling the collection in the next phase

The Geodata Foundations Phase (January 2025-March 2026) established the first curated datasets for the BTAA-GIN Geodata Collection and moved the program from a pilot to an operational model.

The primary effort in this phase was the Urban Base Layers project, which functioned as a trial for core components of the program, including data acquisition, provider engagement, metadata standardization, and access formats. This work also clarified how public geodata is structured and distributed, including common constraints related to formats, metadata, and access.

Throughout the phase, the team tested both direct data provider engagement and independent collection from open data portals, comparing how each approach performed in practice. In addition, we established key technical and workflow decisions, including the use of GeoPackage as a standard format, expanded metadata structures, and a repeatable processing pipeline.

This phase represents the third stage in a five-phase implementation plan:

  1. Blueprints - Research and initial program design
  2. Groundwork - Development of core infrastructure and pilot testing with sample data
  3. Foundation - Creation of initial collections using real-world data and provider engagement
  4. Framework - Scaling the collection through broader intake and more efficient workflows
  5. Lantern - Expanding outreach, access, and program visibility

Geodata Collection Roadmap

Geodata Collection Workgroup final report

Curation Plan, v1

Urban Base Layers curation strategy

Geodata Collection Groundwork (Phase 2) Report

  • Published 31 Urban Base Layer datasets as the first official deposits to the Geodata Collection, sourced from three municipal providers.
  • Added 11 historical layers from an existing dark archive.
  • Identified and collected 99 federal datasets through the Federal Data Rescue Sprint.
  • Engaged four municipal providers through the Urban Base Layers effort.
  • Received new datasets from three partner organizations.
  • Used four collection pathways
  • Collected data from six sources using four collection pathways.
  • These pathways reflected different levels of provider engagement, interest, and data licensing.

Created a standard data processing pipeline

Section titled “Created a standard data processing pipeline”
  • Built a repeatable workflow for converting files to GeoPackage.
  • Applied consistent metadata.
  • Preserved original source files.
  • Expanded metadata to better track where data came from and how it changes over time.
  • Added methods to embed metadata directly into distribution files.
  • Defined priorities for future expansion.
  • Identified next steps, including routine collection cycles, wider geographic coverage, and a more scalable engagement model.

The Geodata Curation Workgroup was convened to guide the transition of the Geodata Collection from an internal pilot to implementation. The group consisted of 2 staff and 4 Team members and met regularly from June 2025 through February 2026. See the workgroup’s final report for additional details.

The workgroup’s responsibilities included making decisions and recommendations about scope and methods, overseeing the Urban Base Layers project, and reviewing datasets and metadata prior to publication.

In March 2025, the Knowledge Committee initiated a federal data rescue sprint to address gaps in existing preservation efforts. While many national efforts focused on .gov websites, this initiative targeted federal geodata hosted on third-party platforms: ArcGIS Hubs and OSF. Through this effort, 99 datasets were identified and archived for long-term access through the BTAA Geoportal.

In November 2025, 19 BTAA-GIN Team Members convened for an in-person strategic retreat to reflect on program history and establish shared priorities for the next 3 years. As part of the retreat:

  • The Geodata Curation Workgroup presented progress and findings from the Foundation Phase

  • A new priority area was identified: curating collections of historical geodata.

The Geodata Collection initiative was shared through a series of presentations and outreach activities:

The curation toolkit established during the Groundwork Phase was expanded to support a more standardized and scalable data processing pipeline. Enhancements focused on three areas: source format strategy, format standardization, and metadata automation.

The approach to input formats was revised following the pilot phase. Earlier workflows relied heavily on Shapefiles; however, further assessment identified several limitations, including loss of schema complexity, field name truncation, and limited support for embedded metadata. To address these issues, File Geodatabases were established as the preferred source format. In most cases, they represent the original data structure more accurately, retaining field definitions and the original coordinate reference system.

A key decision was to adopt GeoPackage as the primary access and distribution format, while retaining original source files for reference and long term access. GeoPackage was selected because it preserves full attribute schemas without loss, supports embedded metadata within the file structures, and is interoperable with widely used open-source tools.

All datasets are now converted to GeoPackage at the beginning of the curation process, resulting in a more consistent and repeatable workflow:

  1. Ingest source data (preferably File Geodatabase)
  2. Convert to GeoPackage (one layer per file)
  3. Apply standardized metadata (automated where possible)
  4. Retain original source files alongside curated outputs

Additional metadata fields were introduced to enable tracking of dataset lifecycle and support future workflow automations. Key additions include provenance fields for documenting source and accrual method, and lifecycle fields for tracking update frequency and review schedules.

A structured metadata format was developed to embed metadata directly within GeoPackage files. Initial implementation used QGIS for manual metadata entry. A scripted workflow was later developed to batch embed metadata from a CSV source.

During the Foundation Phase, we tested multiple approaches for acquiring geodata. While the phase began with an emphasis on having local government partners submit their data directly to us, the work showed that geodata collection can follow several pathways depending on provider engagement, data access conditions, and institutional context.

PathwayDescriptionFoundation Phase example
SubmittedA provider transfers files directly to us.Baltimore and Columbus
ReferredA provider engages with the BTAA-GIN but ultimately directs us to public downloads sources.Philadelphia
HarvestedWe collect publicly available data without direct provider participation.Federal data rescue
CuratedWe identify and process geodata already held within a member institution or legacy collection.Minneapolis

The largest focus of the Foundation Phase was building the Urban Base Layers collection, a 2025 snapshot of core geodata across major metro areas in the Big Ten region. The scope was intentionally narrow and repeatable, focusing on six datasets commonly maintained by cities:

  • Address points
  • Building footprints
  • Municipal boundaries
  • City parks
  • Street centerlines
  • Zoning districts

The team identified 13 cities, one major city per state. Six cities were selected based on existing team relationships. Five cities were contacted, four participated in initial meetings, and three contributed data to the collection.

Of the participating providers, two submitted data directly as File Geodatabases. Philadelphia represented a high-touch provider referral path: the team held multiple meetings, discussed data management and metadata practices, and published a blog post highlighting the partnership, but the data itself was downloaded from the city’s open data portal.

In total, 31 Urban Base Layer datasets representing 2025 were collected from three municipal providers and published in the Geodata Collection.

The project also incorporated 11 legacy municipal datasets from 2011 and 2019 that had been retained in a participating BTAA librarys dark archive.

6. Evaluation of Foundation Phase Findings

Section titled “6. Evaluation of Foundation Phase Findings”

The Foundation Phase prioritized a relationship-first approach to engagement. This proved valuable in several ways.

First, the project expanded and strengthened relationships between the BTAA-GIN and municipal data providers. It established new connections and created opportunities for continued communication. In several cases, discussions with providers led to practical exchanges of information, including approaches to data distribution and methods for improving metadata visibility and access.

Second, engagement improved the team’s understanding of municipal data ecosystems. Conversations with providers clarified how municipal geodata is managed and distributed, including platform constraints, ArcGIS Hub workflows, update cycles, and metadata practices.

Third, the project helped position us as a partner supporting long-term access and research use. These interactions established a foundation for future collaboration, even when immediate data acquisition was limited or when the final data transfer occurred through an open data portal rather than a direct deposit.

Submitted data deposits should be selectively sought

Section titled “Submitted data deposits should be selectively sought”

Obtaining datasets directly from the provider are more likely to preserve fuller technical detail, including complete field names and original coordinate systems. At the same time, this pathway required the highest level of coordination. Outreach, meetings, follow-up, data requests, review, and publication all required extra staff time. Provider capacity also limited what could be accomplished, especially when contacts managed centralized systems but were not the original data creators.

Harvests of public open data should be pursued when permission is clearly stated

Section titled “Harvests of public open data should be pursued when permission is clearly stated”

The federal data rescue work showed that we can collect, review, and prepare public datasets for long-term access without direct provider participation. This pathway may be especially useful for federal sources and for state, regional, or local sources with clear open data licenses or public reuse permissions. This method can reduce coordination time and support more scalable collecting.

Legacy collections can add historical depth

Section titled “Legacy collections can add historical depth”

The Minneapolis legacy datasets showed that valuable geospatial materials may already exist within BTAA institutions, even when they are not currently discoverable or accessible. This pathway is especially important for older datasets that may no longer be available through current public portals. The main challenge is that legacy materials may lack complete metadata or clear provenance.

Data quality differences were real, but limited

Section titled “Data quality differences were real, but limited”

A central assumption of the Foundation Phase was that direct provider deposits would consistently produce higher-quality data and metadata than public downloads. The results were more nuanced.

Provider-supplied File Geodatabases generally preserved more technical detail than portal-generated exports, especially Shapefiles. However, metadata differences were often less substantial. Titles, descriptions, and attribute-level documentation were frequently similar across provider-supplied and portal-derived datasets.

Several challenges appeared across all pathways:

  • Minimal or inconsistent descriptive and lineage metadata
  • Limited or absent attribute-level documentation
  • Unclear temporal coverage
  • Variability in coordinate reference systems
  • Uncertainty about versioning and update history

These issues appear to be systemic in public geodata publishing. Provider engagement can clarify some issues, but it does not always resolve them.

During the Framework Phase, we will refine the use of the four Collection Pathways, develop lightweight guidance for when and how each pathway should be used, and identify where additional documentation, rights review, provider communication, or platform support is needed. The phase will also support more proactive acquisition from open data sources, targeted provider engagement, and continued incorporation of institutional, historical, and legacy geospatial collections.

This work will align with ongoing development of the new geoportal platform and management tools, especially where system features can support ingest, publication, asset management, provenance tracking, and collection review.

PeriodGeneral Focus
April-June 2026Refine the collection framework, clarify pathway-based workflows, and identify priority sources and collection cycle needs.
July-September 2026Test and document routine acquisition workflows, continue targeted provider engagement, and coordinate with platform development.
October-December 2026Evaluate initial workflows, expand collection activity across multiple pathways, and prepare an updated curation plan for ongoing implementation.