4.1.3 Recognition and parsing of SIPs

From aptrust
Jump to: navigation, search

4.1.3 Recognition and parsing of SIPs
Status Ready for review
Compliance Rating Fully compliant

The repository shall have adequate specifications enabling recognition and parsing of the SIPs.

Supporting Text

This is necessary in order to be sure that the repository is able to extract information from the SIPs.

Examples for Meeting the Requirement

Packaging Information for the SIPs; Representation Information for the SIP Content Data, including documented file format specifications; published data standards; documentation of valid object construction.


The repository must be able to determine what the contents of a SIP are with regard to the technical construction of its components. For example, the repository needs to be able to recognize a TIFF file and confirm that it is not simply a file with a filename ending in ‘TIFF’. Another example, would be a website for which the repository would need to be able to recognize and test the validity of the variety of file types (e.g., HTML, images, audio, video, CSS, etc.) that are part of the website. This is necessary in order to confirm: 1) the SIP is what the repository expected; 2) the Content Information is correctly identified; and 3) the properties of the Content Information to be preserved have been appropriately selected.

Evidence Provided

Structure and content of SIPs are described in Definition of SIP. Process of file type recognition and validation can be found in Ingest Timeline and Technical Documentation. Definition of AIP outlines the process of transforming SIPs into AIPs.