OAI-PMH

Overview

The Open Archives Initiative Protocol for Metadata Harvesting (OAI-PMH) is an XML-based web-service protocol that allows clients to fetch metadata about the contents of digital repositories.

For a complete description of the protocol please see the official pages and the specification document.

Usage

EHRI's OAI-PMH endpoint is located at https://portal.ehri-project-stage.eu/api/oaipmh.

Verbs

The protocol consists of six "verbs":

Identify
show information about the current repository (the default verb)
ListSets
list supported record sets (record groupings that can be independently harvested)
ListIdentifiers
list unique identifiers for records within this repository
ListRecords
Return a set of records and an optional resumption token to fetch subsequent sets if greater than a given maximum allowable page size (see paging)
ListMetadataFormats
list the metadata formats supported by this repository
GetRecord
fetch metadata for a specific record given its unique identifier

Some of these verbs require additional parameters. For example, the ListIdentifiers, GetRecord and ListRecords verbs all require a metadataPrefix parameter.


Pagination & Resumption Tokens

The various list-based verbs return only partial data sets if the total size of the set exceeds a fixed value. If this is the case the response will include a resumptionToken value which can be supplied as the value to the resumptionToken parameter to retrieve the next set of data. Note: the resumption token value implicitly includes in its state the value of all parameters other than the verb, so these must not be supplied in addition to the token itself.

Metadata Formats

EHRI's OAI-PMH endpoint supports both Dublin Core (DC) and Encoded Archival Description (EAD) 2002 format archival descriptions. While the DC descriptions only return the top-level of the archival hierarchy (e.g. the description of the fonds), EAD descriptions include levels below the fonds, if present. This means that in addition to the typically more extensive and specific information found in EAD relative to DC, a description of a fonds — whilst technically a single document — can in practice contain a very large amount of information and this should be borne in mind when using, for example, harvesting tools which may not expect large XML payloads.

Record Sets

Sets allow you to selectively harvest a portion of a repository's records. Since EHRI is an metadata aggregator, we support two types of set: country and repository. Country set identifiers consist of lower-case ISO 3166 alpha-2 (2-letter) codes. Repository set identifiers are compound, consisting of the country code, a colon, and the repository's EHRI ID (which also contains the country code), for example at:at-001890.

Example


Run it as a curl command:

curl https://portal.ehri-project-stage.eu/api/oaipmh?verb=Identify

Additional Parameters

In addition to the standard parameters, the ListIdentifiers and ListRecords verbs support from and until parameters to specify UTC dates for selective harvesting in either YYYY-MM-DD or YYYY-MM-DDThh:mm:ssZ formats.