184.108.40.206 Mechanisms in place to ensure any/multiple copies of digital objects are synchronized
|220.127.116.11 Mechanisms in place to ensure any/multiple copies of digital objects are synchronized|
|Status||Ready for review|
|Compliance Rating||Fully compliant|
The ingest process assures that digital objects are correctly store in S3 and Glacier. Part of the process is fixity validation before anything gets stored in the repository.
The PharosPharos is APTrusts web interface to manage deposits and inspect deposit outcomesETag The entity tag is a hash of the object. The ETag reflects changes only to the contents of an object, not its metadata. web application stores a work item list of items that are in the queue for ingest, restore or deletion. In case of errors during either process it is logged there. It is the responsibility of the depositor to check on ingest errors.
Once items are ingested regular (every 90 days) fixity checks occur and depositors get notified per email if errors occur.
See below for details.
The ingest process
The ingest process begins when a depositor uploads a bag (a tar file) to their receiving bucket. The receiving buckets follow the naming convention aptrust.receiving.<member identifier>. APTrust uses domain names as member identifiers, so UVA’s receiving bucket is aptrust.receiving.virginia.edu, UNC’s is aptrust.receiving.unc.edu, etc. The demo system has its own set of receiving buckets, whose names follow the pattern aptrust.receiving.test.<member identifier>.
A cron job called apt_bucket_reader runs every hour or so on apt-prod-services. It scans all of the receiving buckets for new tar files, and for each new file, it creates a WorkItem with action “Ingest,” and it copies that WorkItem’s into NSQ’s apt_fetch topic.
From there, the apt_fetch service on apt-prod-services downloads the file to a staging area, an Elastic Block Storage (EBS) mount attached to apt-prod-services. It reads and validates the bag without untarring it, and if the bag is valid, it pushes the WorkItem ID into NSQ’s apt_store topic.
apt_store copies individual files from the tarred bag into S3 in Northern Virginia and Glacier in Oregon. We generally do not copy files from the tar archive to disk before uploading them. We just read them straight from the tar file into S3. However, due to a poor design choice in the official AWS S3 uploader for Go, we do have to copy files larger than 100MB or so to disk before uploading them to S3 and Glacier. This makes the upload of large file quite slow.
After each file is copied into S3 and then Glacier, apt_store records where and when the file was stored in the in-memory GenericFile record. When all files have been stored to S3 and Glacier, apt_store pushes the WorkItem ID into the apt_record topic of NSQ.
apt_record records a new IntellectualObject record in PharosPharos is APTrusts web interface to manage deposits and inspect deposit outcomesETag The entity tag is a hash of the object. The ETag reflects changes only to the contents of an object, not its metadata., along with one new GenericFile record for each ingested file. It also records to checksums for each file (an md5 and a sha256), and it records a series of events for object and each of its files. These events include creation, ingestion, identifier assignment, message digest calculation, access assignment (for the object), and replication.