Resource Lifecycle¶
The Resource Lifecycle describes how we manage resources from curation to ongoing maintenance.
1. Identify¶
BTAA-GIN Team Members and Product Manager
Team members seek out new content for the geoportal. See the page How to Submit Resources to the BTAA Geoportal for more information.
2. Obtain and Process Metadata¶
a. Harvest¶
Graduate Research Assistants and Product Manager
This stage involves obtaining the metadata for resources. At a minimum, this will include a title and and access link. However, it will ideally also include descriptions, dates, authors, rights, keywords, and more.
Here are the most common ways that we obtain the metadata:
- a BTAA-GIN Team Member sends us the metadata values as individual documents or as a combined spreadsheet
- we are provided with (or are able to find) an API that will automatically generate the metadata in a structured file, such as JSON or XML
- we develop a customized script to scrape directly from the HTML on a source's website
- we manually copy and paste the metadata into a spreadsheet
- a combination of one or more of the above
b. Crosswalk¶
This step involves using a crosswalk to convert the metadata into the schema needed for the Geoportal. Our goal is to end up with a spreadsheet containing columns matching our metadata template.
Why do we rely on CSV?
CSV (Comma Separated Values) files organize tabular data in plain text format, where each row of data is separated by a line break, and each column of data is separated by a delimiter.
We have found this tabular format to be the most human-readable way to batch create, edit, and troubleshoot metadata records. We can visually scan large numbers of records at once and normalize the values in ways that would be difficult with native nested formats, like JSON or XML. Therefore, many of our workflow processes involve transforming things to and from CSV.
c. Edit¶
Graduate Research Assistants and Product Manager
When working with metadata, it is common to come across missing or corrupted values, which require troubleshooting and manual editing in our spreadsheets. Refer to the Collections Project Board for examples of this work.
d. Validate¶
After compiling the metadata, we run a validation and cleaning script to ensure the records conform to the required elements of our schema.
3. Index Metadata¶
Product Manager
a. Ingest to GBL Admin¶
We upload the completed spreadsheet to GBL Admin, which serves as the administrative interface for the Geoportal. If GBL Admin detects any formatting errors, it will issue a warning and may reject the upload.
b. Publish new records to the Geoportal¶
Once the metadata is successfully uploaded to GBL Admin, we can publish the records to the Geoportal. The technology that actually stores the records and enables searching is called Solr.
c. Unpublish¶
Periodically, we need to remove records from the Geoportal. To do this, we use GBL Admin to either delete them or change their status to "unpublished."
4. Maintenance¶
BTAA-GIN Team Members, Graduate Research Assistants, and Product Manager
a. Monitor sources¶
We monitor our sources to check for new and retired contnet.
b. Monitor Geoportal¶
We regularly assess currentness of the content in the Geoportal and check for broken links.
c. Schedule re-harvests¶
We schedule re-harvests from sources based on how frequently they update their content. See the Collections Dashboard for this schedule.
Sequence diagram of processing workflow¶
This diagram illustrates the roles, tasks, and communication workflow for Team Members, the Product Mananger, and Research Assistants. The two technologies are GitHub for tracking work and GBL Admin for managing the metadata records.