Harvest via OAI-PMH¶
Using Illinois Library Digital Collections as example
Steps:
Part 1: get the files via oai¶
- Use this OAI-PMH validator tool at https://validator.oaipmh.com
- Go to the Download XML tab
- Enter the base URL (https://digital.library.illinois.edu/oai-pmh) and the set name (6ff64b00-072d-0130-c5bb-0019b9e633c5-2)
- Wait for the app to pull all the XML files and download them (ideally in a ZIP, but sometimes that doesn't work and you need to click on each file)
Part 2: turn the records into a CSV via OpenRefine¶
- start OpenRefine
- Choose "Get Data from this Computer" and upload the XML files
- From the parsing options, select from the Header "record"
Part 3: Collapse multivalued cells¶
- The multi-valued cells will start out being grouped together by which XML file they came from. We don't want that, so remove the column called File.
- Now, they are grouped by a value "http://www.openarchives.org/OAI/2.0/oai_dc/ http://www.openarchives.org/OAI/2.0/oai_dc.xsd" Leave this for now.
- There are multiple Identifiers (dc:identifier), so select that column, Edit Cells - Join multi-valued cells
- Move the Identifier column to the beginning so that items will be grouped by these unique values
- Collapse the remaining cells with the same Join Multi-valued cells function
- Export to CSV