Support and Maintenance

From aptrust
Jump to: navigation, search

Policies and Procedures

Software updates

System level (operating system) updates are semi-automatic. Security patches are applied automatically on a daily schedule

Security patches for the operating system (Ubuntu Linux) and its libraries are installed on an ongoing basis. Checks for security updates are performed on each server (cronjob) every 24 hours. Updates are only applied from trusted sources. These updates typically do not require the administrators intervention. Minor database updates are applied automatically during AP Trust's maintenance window and are managed by our infrastructure provider Amazon. Other updates are implemented on an as-needed basis and are a high priority by APTrust’s operations team. Non-security updates are usually applied on bi-weekly intervals.

Icinga2 is monitoring required system restarts after kernel updates which take place manually during the bi-weekly maintenance window.

Third-party libraries that are used in our code are checked for updates on a continuous basis by using Codeclimate and Go Report Card. The technical team is notified by email about outdated libraries and acts accordingly.

Software upgrades

The repository adheres to a software development life cycle and is under continuous improvement. The technical staff reviews updates to software components and implements minor upgrades on an ongoing basis and evaluates major upgrades prior implementation. Software upgrades are done were deemed necessary or beneficial. The repositories performance and depositors requirements are one of the main drivers of evaluating newer versions or alternatives of software components.

Responsiblities

The setup instantiation, configuration and availability of AWS services including EC2 instances, S3/Glacier and IAM accounts are the primary responsible Systems Engineer.

The installation and upgrades for base applications required to run locally developed code and services are the responsibility of the Systems Engineer.

Application design, coding and implementation of feature requests or troubleshooting of application bugs are the responsibility of the Sr and Jr Software engineers.

Installation of locally developed services and software are the joint responsibility of Systems and Software Engineers.

Backups

Depository metadata is stored in an Postgresql database using Amazon's RDS service. RDS executes automated daily backups (snapshots) of the production and development databases. Copies of the database are hosted in two availability zones, which a cold-standby copy that will fail over if the primary instance fails. In addition to AWS RDS backup APTrust runs a nightly backup per cron job that is stored on an NFS share that is mounted to each server instance that accesses it's own database.

Fixity Checking

The APTrust system retrieves files for fixity checks on a 90-day basis to ensure data is accurate and complete. At the time of deposit/ingest of new bags the system generates a MD5 and SHA256 hash of each individual file in the bag, compares it with the bag manifest, and stores it in the metadata if it matches the manifest. This ensures that the bag was received correctly and the files are in the exact same state as they were prior to submission and transfer. If a calculated hash doesn't match the one in the supplied bag manifest, the bag is not ingested and marked as failed. Depositors will need to submit a corrected bag. APTrust notifies depositors per email if ingest or a fixity process failed.

Ansible

APTrust is using Ansible for server provisioning, configuration and application deployment. 

Using configuration management ensures infrastructure (servers and environments) to be in a certain state at all times. Rebuilding machines should be automatic and yield the exact same state as the machine that is replaced. 

Ansible requires no server side install other than native Python libraries.  It does everything via SSH and has fewer dependencies as other config management tools. It is widely adopted and heavily community supported.

AWS configuration and deployment is partially handled by Ansible. We intend to add more AWS calls into our Ansible repository 

We keep Ansible roles and playbooks on Github: 

https://github.com/APTrust/ansible-playbooks

All roles are updated to support Ansible 2.x.x

Deployment

The in-house applications like Exchange services and PharosPharos is APTrusts web interface to manage deposits and inspect deposit outcomesETag The entity tag is a hash of the object. The ETag reflects changes only to the contents of an object, not its metadata. are deployed using Ansible.

PharosPharos is APTrusts web interface to manage deposits and inspect deposit outcomesETag The entity tag is a hash of the object. The ETag reflects changes only to the contents of an object, not its metadata. Deployment

Deployment is done with Ansible using a workflow like Capistrano. Details can be found in it's Ansible Role: https://github.com/APTrust/ansible-playbooks/tree/develop/roles/aptrust.pharos The Capistrano workflow is as follows.

  1. First, you need to install Ansible and checkout the `ansible-playbook` repo. 
  2. Every time you are ready to deploy a new version of your application, you need to:
    • Commit and push all changes to your Git repository.
    • If it is an initial deploy run the pharos playbook completely. 
    • Consecutive (update) deploy, run the pharos playbook with tags `deploy, pharos`

Both of these are covered in this guide

Capistrano-style directory structure

In the deployment tutorial, we simply cloned our application code to /var/www/myapp/code and instructed Passenger serve the app from there. But if you use Capistrano's capistrano/deploy recipe to clone/pull from Git, it will use a strictly defined directory structure on the server. Here is an example of how /var/www/myapp – an example application directory – will look like when Capistrano is used:

Copymyapp
 ├── 1 releases
 │       ├── 20150080072500
 │       ├── 20150090083000
 │       ├── 20150100093500
 │       ├── 20150110104000
 │       └── 20150120114500
 │            ├── <checked out files from Git repository>
 │            └── config
 │                 ├── 5 database.yml -> /var/www/myapp/shared/config/database.yml
 │                 └── 6 secrets.yml  -> /var/www/myapp/shared/config/secrets.yml
 │
 ├── 2 current -> /var/www/myapp/releases/20150120114500/
 ├── 3 repo
 │       └── <VCS related data>
 └── 4 shared
         ├── <linked_files and linked_dirs>
         └── config
              ├── database.yml
              └── secrets.yml
  1. releases holds all deployments in a timestamped folder. Every time you instruct Capistrano to deploy, Capistrano makes clones the Git repository to a new subdirectory inside releases.
  2. current is a symlink pointing to the latest release inside the releases directory. This symlink is updated at the end of a successful deployment. If the deployment fails in any step the current symlink still points to the old release.
  3. repo holds a cached copy of the Git repository, for making subsequent Git pulls faster.
  4. shared is meant to contain any files that should persists across deployments and releases. Because a Capistrano deploy works by cloning the Git repository into a releases subdirectory, only files that are version controlled will survive deployments. But sometimes you want more files to survive, for example configuration files (e.g. Rails's config/database.yml and config/secrets.yml), log files, and persistent user storage handed over from one release to the next. You are supposed to put those kinds of files in shared, while instructing Capistrano to symlink them into a release directory on every deploy. This is done through the linked_files andlinked_dirs configuration options, which we will cover in this guide. (5) and (6) show this mechanism in action.

The advantage over the simple /var/www/myapp/code approach in the deployment tutorial is twofold:

  1. It makes deployments atomic. If a deployment fails, the currently running version of the application is not affected. Users also never get to see a state in which an update is half-deployed.
  2. It makes rolling back to previous releases dead-simple. Simply change the current symlink, tell Passenger to restart the app, and done.

Exchange Deployment

Ansible is setup to deploy Exchange.

Detailed information can be found in it's Ansible Role:

https://github.com/APTrust/ansible-playbooks/tree/develop/roles/aptrust.exchange

Sensitive Data and Configuration

All common and non-sensitive data is kept unencrypted in our Ansible repository. 

Critical and sensitive data is securely encrypted using Ansible Vault. To edit and decrypt a password is necessary. This is securely stored using Passpack to which only authorized APTrust engineers have access to.

Workflow and Setup

New setup

Follow installation instructions for your platform here http://docs.ansible.com/ansible/latest/intro_installation.html

Prerequisites

  • Provide APTrust admin with your public SSH key and password hash  `mkpasswd -m sha-512 yourpassword`
  • Admin adds both credentials to Ansible vault and commits to ansible-playbooks repo.
  • Admin runs common playbook on all servers 

Local install:

- Install Ansible on local machine `brew install ansible` (OSX) or apt-get install ansible (Linux)

- git checkout ansible-playbooks repo

- define ~/.ansible.cfg and update paths accordingly, i.e.

# This file should be copied in the users home directory where Ansible looks
# for it's configuration defaults.
#
# Please adjust paths where necessary.
#
#force_color = 1
[defaults]
# Defines default inventory file.
inventory = ~/aptrust/ansible-playbooks/hosts
roles_path = ~/aptrust/ansible-playbooks/roles
# Ask for vault_pass at every Ansible execution if no i
# vault_password_file is defined.
ask_vault_pass = True
# Defines vault password file to avoid password prompts and
# unencrypt vault at playbook runtime.
vault_password_file=~/aptrust/ansible-playbooks/.vault_password
# Callback plugins that are executed at runtime.
callback_plugins = ~/aptrust/ansible-playbooks/callback_plugins/
filter_plugins = ~/aptrust/ansible-playbooks/filter_plugins/
gathering = smart
fact_caching = jsonfile
fact_caching_connection = /tmp/ansible_factcache
fact_caching_timeout = 31557600

host_key_checking = False
retry_files_enabled = False # Do not create them

[ssh_connection]
ssh_args = -o ForwardAgent=yes -o ControlMaster=auto -o ControlPersist=60s -o ControlPath=~/.ssh/%h-%r
# Performance improvement and workaround for
# http://stackoverflow.com/questions/36646880/ansible-2-1-0-using-become-become-user-fails-to-set-permissions-on-temp-file
pipelining = True

[privilege_escalation]
become = true
allow_world_readable_tmpfiles = true

ansible_managed = modified on %Y-%m-%d %H:%M:%S by {uid} on {host}

- ! Be careful storing the vault password in a local file. Make sure your computer is adequately secure and not used by others. 

Add a new system user

  1. Request a SSH public key from the user
  2. Request a password hash  Ask them to run`mkpasswd -m sha-512` in their terminal app, enter a desired password and provide the password hash to you. It may be necessary to install mkpasswd first. brew installl mkpasswd OR gem install mkpasswd might be necessary)
  3. Request a desired username from the user (ideally their local machine user)
  4. Add user to common role: https://github.com/APTrust/ansible-playbooks/blob/develop/roles/common/tasks/main.yml#L90
  5. Add user ssh key to common role: https://github.com/APTrust/ansible-playbooks/blob/develop/roles/common/tasks/main.yml#L104
  6. Add password hash and ssh public key to the vault file https://github.com/APTrust/ansible-playbooks/blob/develop/group_vars/vault.yml
  7. - Map vault vars to unencrypted variable names: https://github.com/APTrust/ansible-playbooks/blob/develop/group_vars/all.yml#L98
  8. Add user to SSH AllowedUsers if necessary. https://github.com/APTrust/ansible-playbooks/blob/develop/group_vars/all.yml#L38
  9. Run common role on desired server(s). Generally I use allservers.yml play for this. You can limit the roles to be run by `-t common` and limit the servers to run on by `-l servername` like  for example`ansible-playbooks allservers.yml --diff -t common -l apt-demo-repo `