5.1.1.1.2 Procedures in place to monitor and receive notifications when hardware technology changes are needed

From aptrust
Jump to: navigation, search


5.1.1.1.2 Procedures in place to monitor and receive notifications when hardware technology changes are needed
Status Ready for review
Compliance Rating Fully compliant
Responsible

The repository shall have procedures in place to monitor and receive notifications when hardware technology changes are needed.

Supporting Text

This is necessary to ensure expected, contracted, secure, and persistent levels of service.

Examples of Meeting the Requirement

Audits of capacity versus actual usage; audits of observed error rates; audits of performance bottlenecks that limit ability to meet user community access requirements; documentation of technology watch assessments; documentation of technology updates from vendors.

Discussion

The repository should conduct or contract frequent environmental scans regarding hardware status, sources of failure, and interoperability among hardware components. The repository should also be in contact with its hardware vendors regarding technology updates, points of likely failure, and how new components may affect system integration and performance. The objective is to track when changes in service requirements by the designated communities require a corresponding change in the hardware technology, when changes in ingestion policies require expanded capabilities, and when changes in preservation policies require new preservation capabilities. This can be driven by changes in capacity requirements (the time needed to read all media is longer than the media lifetime), by changes in delivery mechanisms (new clients for displaying authentic records), and changes in the number and size of archived records.

Evidence Provided

AP Trust’s infrastructure is being monitored (see page: Monitoring) using the open source software Icinga2. It provides continuous monitoring of resources stored in an InfluxDB time-series database and visualized using Grafana. The software uses custom defined usage thresholds that trigger notifications to the operations team of an impending shortage of resources. When spikes in load and performance become more frequent we can consider upscaling hardware. Usage of Grafana and timeseries data about resource utilization can aid with assessments.

Once the operations team is notified about performance issues new systems can be provisioned to replace old ones. Our server configurations are managed by Ansible. Deploying a new and nearly identical system on larger sized hardware is relatively efficient and automated.