CKAN
Purpose¶
To scan the Action API for CKAN data portals and retrieve metadata for new items while returning a list of deleted items.
Warning
This batch CKAN recipe is being deprecated and replaced with recipes tailored to each site.
graph TB
A((STEP 1. <br>Set up directories)) --> B[STEP 2. <br>Run Jupyter Notebook script] ;
B --> C{Did the script run successfully?};
C --> |No| D[Troubleshoot];
D -->A;
C --> |No & I can't figure it out.| F[Refer issue back to Product Manager];
C --> |Yes| E[STEP 3. <br>Edit places names & titles];
E --> G[STEP 4. <br>Upload new records];
G --> H[STEP 5. <br>Unpublish deleted records];
classDef goCell fill:#99d594,stroke:#333,stroke-width:2px
class A,B,C,E,G goCell;
classDef troubleCell fill:#ffffbf,stroke:#333,stroke-width:2px;
class D troubleCell;
classDef endCell fill:#fc8d59,stroke:#333,stroke-width:2px
class F,H endCell;
classDef questionCell fill:#fff,stroke:#333,stroke-width:2px;
class C questionCell;
Step 1: Set up your directories¶
-
Navigate to your local Recipes directory for R-03_ckan.
-
Verify that there are two folders
resource
: contains a CSV for each portal per harvest that lists all of the dataset identifiersreports
: combined CSV metadata files for all new and deleted datasets per harvest
-
Review the CKANportals.csv file. Each active portal should have values in the following fields:
- portalName
- URL
- Provider
- Publisher
- Spatial Coverage
- Bounding Box
Step 2: Run the harvest script¶
- Start Jupyter Notebook
- Open your local copy of R-03_ckan.ipynb
Info
This script will harvest from a set of CKAN data portals. It saves a list of datasets found in each portal and will compare the output between runs. The result will be two CSVs: new items and deleted items.
The script only harvests items that can be identified as shapefiles or imagery.
Step 3: Edit the metadata for new items¶
The new records can be found in reports/allNewItems_{today's date}.csv
and will need some manual editing.
- Spatial Coverage: Add place names related to the datasets.
- Title: Concatenate values in the Alternative Title column with the Spatial Coverage of the dataset.
Step 4: Upload metadata for new records¶
Open GBL Admin and upload the new items found in reports/allNewItems_{today's date}.csv
Step 5: Delete metadata for retired records¶
Unpublish records found in reports/allDeletedItems_{today's date}.csv
. This can be done in GBL Admin manually (one by one) or with the GBL Admin documents update script.