OAI-PMH
Overview
The Open Archives Initiative Protocol for Metadata Harvesting (OAI-PMH) is an XML-based web-service protocol that allows clients to fetch metadata about the contents of digital repositories.
For a complete description of the protocol please see the official pages and the specification document.
Usage
EHRI's OAI-PMH endpoint is located at https://portal.ehri-project-stage.eu/api/oaipmh
.
Verbs
The protocol consists of six "verbs":
- Identify
- show information about the current repository (the default verb)
- ListSets
- list supported record sets (record groupings that can be independently harvested)
- ListIdentifiers
- list unique identifiers for records within this repository
- ListRecords
- Return a set of records and an optional resumption token to fetch subsequent sets if greater than a given maximum allowable page size (see paging)
- ListMetadataFormats
- list the metadata formats supported by this repository
- GetRecord
- fetch metadata for a specific record given its unique identifier
Some of these verbs require additional parameters. For example, the ListIdentifiers
, GetRecord
and ListRecords
verbs all require a metadataPrefix
parameter.
Pagination & Resumption Tokens
The various list-based verbs return only partial data sets if the total size of the set
exceeds a fixed value. If this is the case the response will include a resumptionToken
value
which can be supplied as the value to the resumptionToken
parameter to retrieve the next set
of data. Note: the resumption token value implicitly includes in its state the value
of all parameters other than the verb, so these must not be supplied in addition to the token itself.
Metadata Formats
EHRI's OAI-PMH endpoint supports both Dublin Core (DC) and Encoded Archival Description (EAD) 2002 format archival descriptions. While the DC descriptions only return the top-level of the archival hierarchy (e.g. the description of the fonds), EAD descriptions include levels below the fonds, if present. This means that in addition to the typically more extensive and specific information found in EAD relative to DC, a description of a fonds — whilst technically a single document — can in practice contain a very large amount of information and this should be borne in mind when using, for example, harvesting tools which may not expect large XML payloads.
Record Sets
Sets allow you to selectively harvest a portion of a repository's records. Since EHRI is an metadata aggregator, we support two
types of set: country and repository. Country set identifiers consist of lower-case ISO 3166 alpha-2 (2-letter) codes. Repository
set identifiers are compound, consisting of the country code, a colon, and the repository's EHRI ID (which also contains the country
code), for example at:at-001890
.
Example
Run it as a curl command:
curl https://portal.ehri-project-stage.eu/api/oaipmh?verb=Identify
Additional Parameters
In addition to the standard parameters, the ListIdentifiers
and ListRecords
verbs
support from
and until
parameters to specify UTC dates for selective harvesting in
either YYYY-MM-DD
or YYYY-MM-DDThh:mm:ssZ
formats.