Local Controlled Values¶

This page provides reference for some local controlled values.

Harvest Workflow codes¶

The prefix indicates the primary strategy we use to process the records, even when it is a multistep workflow:

template : creating the records directly in the GeoBTAA metadata profile. Typically, we are using a CSV or even the GBL Admin form view.
tools : relying on existing metadata software tools & apps to obtain and/or process the metadata. Right now, the tools are the OAI validator, OpenRefine, and MarcEdit.
py : python modules and scripts do most of the work. Any additional steps should be described within the code or its accompanying documentation

Harvest Workflow	Description
template_website	author a website only record directly in the GeoBTAA profile
template_children	author a website and children records directly in the GeoBTAA profile
template_csv	receive a spreadsheet of metadata and do some minor cleanup
tools_oai_openrefine	access the OAI-PMH API OAI validator; export an XML; load it into OpenRefine for cleanup and conversion to CSV
tools_marcedit	use the MarcEdit tool to convert MARC records into a CSV
py_arcgis_hub	ArcGIS Hubs
py_arcgis_online	ArcGIS Online data collections
py_socrata	Socrata portals
py_ckan	CKAN portals
py_hdx	Humanitarian Data Exchange CKAN portal
py_ogm_wisc	Individual JSONs in UW-Madison’s GBL 1.0 metadata
py_ogm_gbl1	Individual JSONs in the OGM GBL 1.0 metadata
py_ogm_aardvark	individual JSONs in the OGM Aardvark metadata
py_pasda	PASDA custom data portal (HTML parser)
py_isgs	ISGS custom data portal (HTML parser)
py_umedia	UMN UMedia modified Blacklight portal
py_opex	OPEX / MODs files from Michigan
py_oaipmh	OAI-PMH endpoints (use Python instead of tools)
py_mods	MODS metadata files

Resource Types for Websites¶

Data portals¶

Purpose: to enable users to search, discover, and access datasets and web services for their own use.
Examples: A CKAN search portal or an ArcGIS Hub

Static web pages¶

Purpose: simple, direct distribution of files. The content does not change in response to user interaction.
Examples: A basic HTML page with a list of links to download ZIP files or an FTP site.

Digital repositories¶

Purpose: long-term, stable preservation and scholarly citation of digital objects, which may include datasets but also papers, images, and other media. The emphasis is on archival quality and persistent identification.
Examples: University digital conservancies, library collections on platforms like DSpace or Fedora, research institutes with data collections

Interactive resources¶

Purpose: to guide a user through a narrative or a curated data exploration. The user is consuming a finished information product.
Examples: Esri StoryMaps or web map applications.

Accrual method values¶

This is how we obtain the metadata (not how we process it)

Automated retrieval¶

Definition: Metadata was actively retrieved from a live, external source using an automated or semi-automated tool. This is a "pull" method; we get the metadata ourselves.

Examples:

Running a Python script to query an API
Using a tool to connect to an OAI-PMH endpoint
Scraping a web page

This is a passive harvest, we can redo it anytime.

Mediated deposit¶

Definition: Metadata was sent as a discrete file or set of files from an external party. This is a "push" method; we are receiving a package.

Examples:

A data provider emails you a CSV spreadsheet.
You are given a link to download a ZIP file containing XML records.
You download a JSON file from a repository's export page.

This relies on someone submitting records, we need them to send us files. We might reprocess those files for metadata augmentation but cannot get new ones ourselves.

Manual curation¶

Definition: Metadata was created record-by-record through direct human input into a system or template, often by looking at a source website or document.

Examples:

Typing metadata into the GBL Admin interface.
Filling out a GeoBTAA spreadsheet template from scratch by examining a web map.

This is a passive harvest, but is generally laborious to replicate. This should only be done for website level records or things we deem high priority and very stable.