Geodata Collection Foundation Phase Report
Summary:This document reports on the progress of the BTAA-GIN Geodata Collection as of June 2026.
Q1 2025 - Q1 2026
Key accomplishments
- 144 datasets archived
- 4 municipal data provider partnerships established
- A standardized workflow centered on open source formats
- Expanded visibility through outreach, publications, and presentations
- A tested foundation for scaling the collection in the next phase
1. Introduction
Section titled “1. Introduction”Summary
Section titled “Summary”The Geodata Foundations Phase (January 2025-March 2026) established the first curated datasets for the BTAA-GIN Geodata Collection and moved the program from a pilot to an operational model.
The primary effort in this phase was the Urban Base Layers project, which functioned as a trial for core components of the program, including data acquisition, provider engagement, metadata standardization, and access formats. This work also clarified how public geodata is structured and distributed, including common constraints related to formats, metadata, and access.
Throughout the phase, the team tested both direct data provider engagement and independent collection from open data portals, comparing how each approach performed in practice. In addition, we established key technical and workflow decisions, including the use of GeoPackage as a standard format, expanded metadata structures, and a repeatable processing pipeline.
Context
Section titled “Context”This phase represents the third stage in a five-phase implementation plan:
- Blueprints - Research and initial program design
- Groundwork - Development of core infrastructure and pilot testing with sample data
- Foundation - Creation of initial collections using real-world data and provider engagement
- Framework - Scaling the collection through broader intake and more efficient workflows
- Lantern - Expanding outreach, access, and program visibility
Supporting Documents
Section titled “Supporting Documents”Geodata Collection Workgroup final report
Urban Base Layers curation strategy
Geodata Collection Groundwork (Phase 2) Report
Foundation phase outcomes
Section titled “Foundation phase outcomes”Collected and published datasets
Section titled “Collected and published datasets”- Published 31 Urban Base Layer datasets as the first official deposits to the Geodata Collection, sourced from three municipal providers.
- Added 11 historical layers from an existing dark archive.
- Identified and collected 99 federal datasets through the Federal Data Rescue Sprint.
Built data provider partnerships
Section titled “Built data provider partnerships”- Engaged four municipal providers through the Urban Base Layers effort.
- Received new datasets from three partner organizations.
- Used four collection pathways
- Collected data from six sources using four collection pathways.
- These pathways reflected different levels of provider engagement, interest, and data licensing.
Created a standard data processing pipeline
Section titled “Created a standard data processing pipeline”- Built a repeatable workflow for converting files to GeoPackage.
- Applied consistent metadata.
- Preserved original source files.
Improved metadata and automation
Section titled “Improved metadata and automation”- Expanded metadata to better track where data came from and how it changes over time.
- Added methods to embed metadata directly into distribution files.
Set up the next phase for growth
Section titled “Set up the next phase for growth”- Defined priorities for future expansion.
- Identified next steps, including routine collection cycles, wider geographic coverage, and a more scalable engagement model.
2. BTAA-GIN program activities
Section titled “2. BTAA-GIN program activities”Geodata Curation Workgroup
Section titled “Geodata Curation Workgroup”The Geodata Curation Workgroup was convened to guide the transition of the Geodata Collection from an internal pilot to implementation. The group consisted of 2 staff and 4 Team members and met regularly from June 2025 through February 2026. See the workgroup’s final report for additional details.
The workgroup’s responsibilities included making decisions and recommendations about scope and methods, overseeing the Urban Base Layers project, and reviewing datasets and metadata prior to publication.
Federal Data Rescue Sprint
Section titled “Federal Data Rescue Sprint”In March 2025, the Knowledge Committee initiated a federal data rescue sprint to address gaps in existing preservation efforts. While many national efforts focused on .gov websites, this initiative targeted federal geodata hosted on third-party platforms: ArcGIS Hubs and OSF. Through this effort, 99 datasets were identified and archived for long-term access through the BTAA Geoportal.
BTAA-GIN Strategic Retreat
Section titled “BTAA-GIN Strategic Retreat”In November 2025, 19 BTAA-GIN Team Members convened for an in-person strategic retreat to reflect on program history and establish shared priorities for the next 3 years. As part of the retreat:
-
The Geodata Curation Workgroup presented progress and findings from the Foundation Phase
-
A new priority area was identified: curating collections of historical geodata.
Presentations and event participation
Section titled “Presentations and event participation”The Geodata Collection initiative was shared through a series of presentations and outreach activities:
-
Geo4LibCamp presentation on the Geodata Collection Pilot (May 2025)
-
“Sustaining Future Historians: The Case for Preserving Ephemeral geodata” presentation at IASSIST international Conference (May 2025) and a longer webinar at Ohio State University*”* (June 2025)
-
Participation in the Minnesota Geospatial Advisory Committee’s Archiving Committee
-
Blog post highlighting our collaboration with the city Philadelphia (March 2026)
-
Panel presentation at the National Geospatial Advisory Committee (March 2026)
3. Data processing and metadata framework
Section titled “3. Data processing and metadata framework”The curation toolkit established during the Groundwork Phase was expanded to support a more standardized and scalable data processing pipeline. Enhancements focused on three areas: source format strategy, format standardization, and metadata automation.
Data Processing
Section titled “Data Processing”Geodatabases as a source format
Section titled “Geodatabases as a source format”The approach to input formats was revised following the pilot phase. Earlier workflows relied heavily on Shapefiles; however, further assessment identified several limitations, including loss of schema complexity, field name truncation, and limited support for embedded metadata. To address these issues, File Geodatabases were established as the preferred source format. In most cases, they represent the original data structure more accurately, retaining field definitions and the original coordinate reference system.
GeoPackage as access format
Section titled “GeoPackage as access format”A key decision was to adopt GeoPackage as the primary access and distribution format, while retaining original source files for reference and long term access. GeoPackage was selected because it preserves full attribute schemas without loss, supports embedded metadata within the file structures, and is interoperable with widely used open-source tools.
All datasets are now converted to GeoPackage at the beginning of the curation process, resulting in a more consistent and repeatable workflow:
- Ingest source data (preferably File Geodatabase)
- Convert to GeoPackage (one layer per file)
- Apply standardized metadata (automated where possible)
- Retain original source files alongside curated outputs
Metadata
Section titled “Metadata”Expanded metadata fields
Section titled “Expanded metadata fields”Additional metadata fields were introduced to enable tracking of dataset lifecycle and support future workflow automations. Key additions include provenance fields for documenting source and accrual method, and lifecycle fields for tracking update frequency and review schedules.
Embedded metadata
Section titled “Embedded metadata”A structured metadata format was developed to embed metadata directly within GeoPackage files. Initial implementation used QGIS for manual metadata entry. A scripted workflow was later developed to batch embed metadata from a CSV source.
4. Collection Pathways
Section titled “4. Collection Pathways”During the Foundation Phase, we tested multiple approaches for acquiring geodata. While the phase began with an emphasis on having local government partners submit their data directly to us, the work showed that geodata collection can follow several pathways depending on provider engagement, data access conditions, and institutional context.
| Pathway | Description | Foundation Phase example |
|---|---|---|
| Submitted | A provider transfers files directly to us. | Baltimore and Columbus |
| Referred | A provider engages with the BTAA-GIN but ultimately directs us to public downloads sources. | Philadelphia |
| Harvested | We collect publicly available data without direct provider participation. | Federal data rescue |
| Curated | We identify and process geodata already held within a member institution or legacy collection. | Minneapolis |
5. Urban Base Layers Collection
Section titled “5. Urban Base Layers Collection”The largest focus of the Foundation Phase was building the Urban Base Layers collection, a 2025 snapshot of core geodata across major metro areas in the Big Ten region. The scope was intentionally narrow and repeatable, focusing on six datasets commonly maintained by cities:
- Address points
- Building footprints
- Municipal boundaries
- City parks
- Street centerlines
- Zoning districts
The team identified 13 cities, one major city per state. Six cities were selected based on existing team relationships. Five cities were contacted, four participated in initial meetings, and three contributed data to the collection.
Of the participating providers, two submitted data directly as File Geodatabases. Philadelphia represented a high-touch provider referral path: the team held multiple meetings, discussed data management and metadata practices, and published a blog post highlighting the partnership, but the data itself was downloaded from the city’s open data portal.
In total, 31 Urban Base Layer datasets representing 2025 were collected from three municipal providers and published in the Geodata Collection.
The project also incorporated 11 legacy municipal datasets from 2011 and 2019 that had been retained in a participating BTAA librarys dark archive.
6. Evaluation of Foundation Phase Findings
Section titled “6. Evaluation of Foundation Phase Findings”Data provider engagement is valuable
Section titled “Data provider engagement is valuable”The Foundation Phase prioritized a relationship-first approach to engagement. This proved valuable in several ways.
First, the project expanded and strengthened relationships between the BTAA-GIN and municipal data providers. It established new connections and created opportunities for continued communication. In several cases, discussions with providers led to practical exchanges of information, including approaches to data distribution and methods for improving metadata visibility and access.
Second, engagement improved the team’s understanding of municipal data ecosystems. Conversations with providers clarified how municipal geodata is managed and distributed, including platform constraints, ArcGIS Hub workflows, update cycles, and metadata practices.
Third, the project helped position us as a partner supporting long-term access and research use. These interactions established a foundation for future collaboration, even when immediate data acquisition was limited or when the final data transfer occurred through an open data portal rather than a direct deposit.
Submitted data deposits should be selectively sought
Section titled “Submitted data deposits should be selectively sought”Obtaining datasets directly from the provider are more likely to preserve fuller technical detail, including complete field names and original coordinate systems. At the same time, this pathway required the highest level of coordination. Outreach, meetings, follow-up, data requests, review, and publication all required extra staff time. Provider capacity also limited what could be accomplished, especially when contacts managed centralized systems but were not the original data creators.
Harvests of public open data should be pursued when permission is clearly stated
Section titled “Harvests of public open data should be pursued when permission is clearly stated”The federal data rescue work showed that we can collect, review, and prepare public datasets for long-term access without direct provider participation. This pathway may be especially useful for federal sources and for state, regional, or local sources with clear open data licenses or public reuse permissions. This method can reduce coordination time and support more scalable collecting.
Legacy collections can add historical depth
Section titled “Legacy collections can add historical depth”The Minneapolis legacy datasets showed that valuable geospatial materials may already exist within BTAA institutions, even when they are not currently discoverable or accessible. This pathway is especially important for older datasets that may no longer be available through current public portals. The main challenge is that legacy materials may lack complete metadata or clear provenance.
Data quality differences were real, but limited
Section titled “Data quality differences were real, but limited”A central assumption of the Foundation Phase was that direct provider deposits would consistently produce higher-quality data and metadata than public downloads. The results were more nuanced.
Provider-supplied File Geodatabases generally preserved more technical detail than portal-generated exports, especially Shapefiles. However, metadata differences were often less substantial. Titles, descriptions, and attribute-level documentation were frequently similar across provider-supplied and portal-derived datasets.
Several challenges appeared across all pathways:
- Minimal or inconsistent descriptive and lineage metadata
- Limited or absent attribute-level documentation
- Unclear temporal coverage
- Variability in coordinate reference systems
- Uncertainty about versioning and update history
These issues appear to be systemic in public geodata publishing. Provider engagement can clarify some issues, but it does not always resolve them.
7. Next Phase: Framework
Section titled “7. Next Phase: Framework”Q2-Q4 2026
Section titled “Q2-Q4 2026”During the Framework Phase, we will refine the use of the four Collection Pathways, develop lightweight guidance for when and how each pathway should be used, and identify where additional documentation, rights review, provider communication, or platform support is needed. The phase will also support more proactive acquisition from open data sources, targeted provider engagement, and continued incorporation of institutional, historical, and legacy geospatial collections.
This work will align with ongoing development of the new geoportal platform and management tools, especially where system features can support ingest, publication, asset management, provenance tracking, and collection review.
Implementation Timeline
Section titled “Implementation Timeline”| Period | General Focus |
|---|---|
| April-June 2026 | Refine the collection framework, clarify pathway-based workflows, and identify priority sources and collection cycle needs. |
| July-September 2026 | Test and document routine acquisition workflows, continue targeted provider engagement, and coordinate with platform development. |
| October-December 2026 | Evaluate initial workflows, expand collection activity across multiple pathways, and prepare an updated curation plan for ongoing implementation. |