Encrypted Backup to Amazon S3 (2014)


Posted:   |  More posts about System Administration

The following is a guide on how to do encrypted backups to Amazon S3. This is an updated version of my older system which used the Dt-S3-Backup script but I've decided to try a replacement which is available from the EPEL repository - duply. This guide has been updated for and tested on CentOS 6.

It's worth noting that the Dt-S3-Backup script that my old system used is still a good solution, but it is now being maintained by somebody else and it has a few more features such as restoring individual files.

Dependencies

You will require an Amazon S3 account, which requries a debit/credit card.

  • duplicity (for incremental backups and Amazon S3 integration)
  • duply (for simplifying use of Duplicity)
  • gpg (for encrypting the backup)
  • python-boto (the library that Duplicity uses to interface with AWS)

Assuming you have the EPEL repo enabled on your system, install the dependencies.

$ yum install duplicity duply gpg python-boto

Process

Amazon S3 and GPG keys

Once you have created your Amazon AWS account you will need to make a bucket for your backups on S3 and then generate a user and group for your backups.

Creating the bucket through the AWS console should be self explanitory, and the AWS user and groups can be created at https://console.aws.amazon.com/iam/home?#home.

When creating the group for the backup user you will want the group to only have access to the bucket you created for your backups. You can select the Custom Policy option and use the following policy which gives the backup user full control within the designated bucket (eg. "your-backup-bucket-name").

{
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
        "s3:ListBucket",
        "s3:GetBucketLocation",
        "s3:GetBucketAcl"
      ],
      "Resource": "arn:aws:s3:::your-backup-bucket-name"
    },
    {
       "Effect": "Allow",
       "Action": [
          "s3:*"
      ],
      "Resource": "arn:aws:s3:::your-backup-bucket-name/*"
    }
  ]
}

Create a user and add them to the group, and note the Access Key ID and the Secret Access Key. The former and the latter are like your backup user's username and password.

Next you should make a GPG key to encrypt your backup. You should create a dedicated GPG key for backups because the private key needs to reside on your server which isn't a good idea for your main GPG key. Duplicity needs the private key to decrypt the backup manifest so that it can do incremental backup.

To create a gpg key, run the following. It's easiest to do this on the server because that way you don't have to worry about importing or trusting. If you wear a tinfoil hat then generating the key on the server is undesirable due to the possibility that the RNG is compromised or the key is weakened by possible eavesdropping.

# gpg --gen-key
Please select what kind of key you want:
   (1) DSA and Elgamal (default)
   (2) DSA (sign only)
   (5) RSA (sign only)
Your selection? 1
DSA keypair will have 1024 bits.
ELG keys may be between 1024 and 4096 bits long.
What keysize do you want? (2048) 4096
Requested keysize is 4096 bits
Please specify how long the key should be valid.
         0 = key does not expire
      <n>  = key expires in n days
      <n>w = key expires in n weeks
      <n>m = key expires in n months
      <n>y = key expires in n years
Key is valid for? (0)
Key does not expire at all
Is this correct? (y/N) y

    ...The rest is up to you...

Make sure in the comment that you note that it is for your backups, so that you do not get it confused with your personal key.

After the key is finished generating (4096 bits worth of entropy takes a while!) then you need to export the private key and keep it in a safe place.

# gpg --list-keys
/home/Daniel/.gnupg/pubring.gpg
-------------------------------
pub   1024D/6BBDC0C6 2010-06-01
uid                  Daniel Devine (just doing this as a test) <devine@ddevnet.net>
sub   4096g/424CF9A7 2010-06-01

# gpg --output DanielDevine_Backup_public.gpg --armor --export 6BBDC0C6
# gpg --output DanielDevine_Backup_private.gpg --armor --export-secret-key 6BBDC0C6

The number you need after the --export option will be different for you, so substitute your own by observing the output of gpg --list-keys

Now that you have exported your keys into a format you can transfer you should run an MD5 sum on it before you transfer it off the server.

# md5sum DanielDevine_Backup_private.gpg
4343afa0bccb0878e123209766687614  DanielDevine_Backup_private.gpg

Then you need to transfer it off your server. I prefer to use SFTP for this because it is easiest and most secure. If you have a SSH server running on the remote machine then you (most likely) have SFTP.

# sftp [YourUsername]@[your IP address]
> PUT DanielDevine_Backup_private.gpg

> exit

On the computer that you transferred the file to do a md5sum on the file and check if the output matches what it did on your server. If it doesn't then transfer the file again because the first copy you did is corrupt and you can't use it to decrypt your backup!

Then delete the exported private key from the server - just don't leave it laying around!

Configuring Duply

Run duply mybackupprofile create to generate the configuration for a profile called mybackupprofile. This should create a configuration file at /etc/duply/mybackupprofile/conf.

Open the configuration file and fill in the following replacing placeholders where appropriate:

# Enter the ID of the key you want to use - you can view the id by running "gpg --list-keys"
GPG_KEY='4C837387'
# Password for your PGP key.
GPG_PW='[GPG key password]'

# The 'ap-southeast-2' can be replaced with a region closer to your server, listed here: http://docs.aws.amazon.com/general/latest/gr/rande.html#s3_region
TARGET='s3://s3-ap-southeast-2.amazonaws.com/[bucket]/[backup folder]'
TARGET_USER='[AWS Access Key Id]'
TARGET_PASS='[AWS Secret Key]'

SOURCE='\'
# Only keep backups for 3 months.
MAX_AGE=3M

# Number of full backups to keep. This should mean that you have 6 months of backup, and you do incrementals for 3 months until a full backup is taken then incrementals start for the next full backup.
MAX_FULL_BACKUPS=2

# Use Reduced Redundancy storage. The Standard Storage Class is 99.999999999% (nine point nine nines) durable where as the Reduced Storage Class is 99.99% durable - which means it is only stored at two Amazon facilities. Feel free to omit this. You save 20% per GB for the first tier (1TB/month) by using RRS. Keep the trailing space!
DUPL_PARAMS="$DUPL_PARAMS --s3-use-rrs "

Read the comments and fill in the variables. Not all of the configuration options have to be filled in because the defaults are pretty good. If you want more information you may find answers in the Duplicity man page ("man duplicity") for a better description.

It's worth noting that if you just specify GPG_PW and and comment out GPG_KEY you are able to use a symmetric encryption scheme. Obviously you should generate a long and strong password (ie. openssl rand -base64 20). The advantage of this method is that it is extremely simple as there is no messing with GPG keys and it also makes it easier to write this on paper (or to a simple text file) and physically secure it, which may be appropriate if you are working with less savvy colleagues.

Because we set SOURCE to be the root of the filesystem we will need to exclude things from our backups. Edit /etc/duply/mybackupprofile/exclude and specify all the directories that you don't want to be backed up. Although the file is called "exclude" you can also explicity include by prefixing lines with "+" instead of "-" (see the Duplicity man page for more info).

Before you start editing you may want to assess what is important to you. I only want to back up configuration, user data and logs. Run ls / and observe the output as this is a useful starting point in deciding what you don't want. Excluding /proc and /dev is important.

- /dev
- /home/*/.cache
- /root/.cache
- /lost+found
- /media
- /mnt
- /proc
- /run
- /selinux
- /sys
- /tmp
- /var/cache/*/*
- /var/run
- /var/tmp
- /bin
- /sbin
- /cgroup
- /lib
- /lib64
- /usr
- /boot
- /srv
- /tmp
- /opt

You should back up your databases by dumping them. To make sure the database dumps are run before the file backup is run you can create a small script that Duply wil automatically run before backing up. He's my simple /etc/duply/mybackupprofile/pre example.

#!/bin/bash
echo "Running pre-backup routine..."

# Do MySQL Dumping here.

echo "Backuping up ddevnetowncloud DB."
mysqldump -u root -pMYSQLPASSWORD ddevnetowncloud > /var/backup/ddevnetowncloud.sql

# Leave none behind... Let's take a complete dump to make sure we have all the DBs.
echo "Doing complete MySQL server backup."
mysqldump -u root --all-databases -pMYSQLPASSWORD > /var/backup/all_databases.sql

# Restore DBs with: mysql -u root -p[root_password] [database_name] < dumpfilename.sql

Next you should add an entry to crontab. The following is a good snippet you can add to the top of the file to guide you through scheduling tasks. Run crontab -e to edit the crontab.

# *     *     *   *    *        command to be executed
# -     -     -   -    -
# |     |     |   |    |
# |     |     |   |    +----- day of week (0 - 6) (Sunday=0)
# |     |     |   +------- month (1 - 12)
# |     |     +--------- day of        month (1 - 31)
# |     +----------- hour (0 - 23)
# +------------- min (0 - 59)
#
# * means all, and remember for every level you go in place a 0 to fill. You can use a comma to specify muliple values and / to express a fraction. @hourly, @daily and @weekly macros can be used.
# All stdout/stderr output will be emailed to the root user automatically - if you don't want this erminate a command with "&> /dev/null"

#@daily duply mybackupprofile backup
# OR for twice-daily backups.
0 */12 * * * duply mybackupprofile backup

I suggest backing up at least daily - incremental backups don't chew up much space so twice daily is fairly affordable. Check root's email regularly to make sure your backup system is taking backups without issue.

Fin. Go verify your backup!

Restoring Files

# To restore a full copy of your backup to a directory run the following:
$ duply mybackupprofile restore /root/mybackuprestore

# To restore a specific file or directory do the following (restore the /etc directory):
$ duply mybackupprofile fetch etc /root/iscrewedup

# To restore a backup from a specific date first list out your backups and note the time of the one you want.
$ duply mybackupprofile status
...
# Make sure you specify the time in one of the accepted formats: http://duplicity.nongnu.org/duplicity.1.html#sect9 - copying and pasting the date from the status command doesn't work!
...
$ duply mybackupprofile restore /root/restorefordate '2002-01-25T07:00:00+02:00'

Troubleshooting

If you imported an existing key and you get an error to the effect of "There is no assurance this key belongs to the named user" then you probably need to set the trust of the key by doing the following.

# gpg --edit-key [Key ID - eg. DS8F978D]
> trust
> 5 [or whatever the level for ultimate trust is]
> save [not sure if this is needed]
> quit)

Check out the old version of this article if you get errors about system time.

Comments powered by Disqus
Contents © 2015 Daniel Devine - Nikola Powered - Flattr Me! Flattr this Source