Skip to content

koormath/dataverse

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 

Repository files navigation

Harvesting the Metadata from Datasets in Dataverse Using a Python Script

If you are using this script on the dataverse installed server where the python package are available, you could run it on the terminal.

I have prepared this document to extract metadata from datasets using Python 3.14 on Windows 11. We are using the OAI protocol to harvest the metadata.

  1. Install Python 3.14 on a Windows system https://www.python.org/downloads/windows/

  2. Add a path to Python in Environmental Variables

Eg:-
D:\Python
D:\Python\Scripts\

  1. Verify Installation

Open Command Prompt and run:
python --version

Disable Microsoft Store Python Alias, if Windows keeps redirecting Python to the Microsoft Store

Open Settings

Go to Apps → Advanced app settings

Click App execution aliases

Turn OFF:

python.exe

python3.exe

  1. Restart Command Prompt and test again:

python --version

  1. Install Required Python Package**
    Now install the required library:
    pip install requests

  2. Download and extract the zip file into a folder

7 . Edit the base URL under the heading # configuration
Eg. https://dataverse.harvard.edu/oai
Please don't remove the /oai from the Base URL. Save the file as harvest_dataverse_oai_csv.py

  1. Open the command prompt

  2. Go to the folder where you saved the script
    E.g.
    cd C:\Users\koormath\Documents

If the file is somewhere else, you can locate it with:
dir harvest_dataverse_oai_csv.py

  1. Run the script
    python harvest_dataverse_oai_csv.py

  2. Output File
    It will create an output file for all metadata from all datasets.
    dataverse_oai_records.csv

About

Bulk Extraction of Datasets’ metadata in Dataverse using Python Script

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages