Seamlessy dump, encrypt and compress your database to the cloud

07.07.2019 - AYB - Reading time ~1 Minute

Postgres Logo

There are a lot of places where using cloud databases isn’t a production solution for various reasons. We’re not going to discuss that, but will backup the Postgres database to the cloud and encrypt it on the fly. There is no difference what type of DB to dump — we will use Postgres just as an example.

There are several common options on how to backup database:

  • make a complete database dump to file
  • backup database’s transaction log between full dumps

For really large databases the second option could be the only way to not get broke by the storage bills, but for databases less than 30GB in size a daily full dump is the best solution.

Enough smalltalk, let’s go with a Bash script that will be executed by cron daemon at the desired least busy time.

I, personally, like the colored output. That allows me to quickly overwatch the log for the red signs to see exactly where something has gone wrong. I use coloring even for the small scripts just in sake of sameness.

#!/bin/bash
# set -x
RED='\033[0;31m'
GREEN='\033[0;32m'
BLUE='\033[0;34m'
NC='\033[0m'

Then we’re going to declare some variables. Note that we’re declaring a variable containing a list of databases which names are beginning with a “prod” string by executing a simple cli-query against the database server. Google out the same command for your own DB engine to have this feature in your script.

TIME=`date +%H:%M:%S`

S3COMMONPATH="s3://pg-backup.prod.domain.xyz/"
S3MONTHLYPATH="s3://pg-backup.monthly.prod.domain.xyz"
FILENAME=""
MONTHLY="False"

DATABASES=$(sudo -u postgres psql -qAtX -c "SELECT datname FROM pg_database WHERE datname LIKE 'prod%';")

Here the action starts. First of all we echoing the time script started working and a banner to make it easy to separate different stages if you’ll get this script extended. Then we estimating isn’t it a time to make a monthly copy to a separate bucket and storing the result in the variable.

echo -e "${GREEN}\n\n[ INFO ] Script started at `date +%Y%m%d-%H:%M`\n"

echo -e "${GREEN}
========================================================
||                                                    ||
||                 DUMPING DATABASES                  ||
||                                                    ||
========================================================
${NC}"

if [ `date +%d` -eq 01 ]; then
  MONTHLY="True"
fi

Then the function trick comes in making available a separate background process for each DB at the server, but be careful — multiple parallel dumps may make server falling on its knees while processing.
This function is being launched as a background process for every database found at the server.
The main trick is extensive use of pipes avoiding the storing and processing backups locally which may lead to errors and server crashes due to lack of space.
First of all, we launching everything in a try-catch cycle. Because we want to know things gone wrong.
Let’s go over first if cycle:

  • we’re calling the pg_dump for the DBname given and piping it to the
  • gpg application that will encrypt everything piped by pg_dump and compress it on the go. I’ve found that zipping using pg_dump costs a lot more CPU than using gpg. Then gpg piping output directly to the
  • aws-cli which is storing the input at the desired S3 bucket using the $FILENAME variable set.

At the second part of the function, we’re checking the state of $MONTHLY variable and if true — copying the resulting file from one bucket to another using aws-cli.

function backup {
    echo -e "${GREEN}[ INFO ] `date +%H:%M:%S` Backing up: $dbname.${NC}"
    FILENAME="$dbname/$dbname"_`date +%Y%m%d-%H%M`.pg_dump.gpg
    if sudo -u postgres pg_dump -b -p 5432 -Fc "$dbname" | 
    	/usr/bin/gpg --batch --yes --no-use-agent -c --cipher-algo AES256 --passphrase-file "/home/backupuser/.conf/gpg_passphrase" -z 5 | 
    	/usr/local/bin/aws s3 cp - ${S3COMMONPATH}${FILENAME}
    then
      echo -e "${GREEN}[ INFO ] `date +%H:%M:%S` $dbname dumped successfully${NC}"
    else
      echo -e "${RED}[ FAIL ] `date +%H:%M:%S` $dbname pgdump failed!${NC}"
    fi

    if [ ${MONTHLY} = True ]; then
      if /usr/local/bin/aws s3 cp ${S3COMMONPATH}${FILENAME} ${S3MONTHLYPATH}$dbname/; then
        echo -e "${GREEN}[ INFO ] `date +%H:%M:%S` $dbname copied successfully to monthly storage${NC}"
      else
        echo -e "${RED}[ FAIL ] `date +%H:%M:%S` $dbname copy to monthly storage failed!${NC}"
      fi
    fi

}

This is the final part of the script which is a kind of “main” function. Here we’re iterating over the $DATABASES variable and launching background processes for each. Note that process called is being sent to the background if & is placed at the end of the command, and wait after the iterator will tell the script that it should wait until all background processes are finished before finishing its job. And at the end, we’re echoing the finish time of the script.

while read -r dbname; do
    backup &
done <<< "$DATABASES"
wait

echo -e "${GREEN}\n\n[ INFO ] Script done at `date +%Y%m%d-%H:%M`\n"

The final batch:

#!/bin/bash
# set -x
RED='\033[0;31m'
GREEN='\033[0;32m'
BLUE='\033[0;34m'
NC='\033[0m'
 
TIME=`date +%H:%M:%S`

S3COMMONPATH="s3://pg-backup.prod.domain.xyz/"
S3MONTHLYPATH="s3://pg-backup.monthly.prod.domain.xyz"
FILENAME=""
MONTHLY="False"

DATABASES=$(sudo -u postgres psql -qAtX -c "SELECT datname FROM pg_database WHERE datname LIKE 'prod%';")

echo -e "${GREEN}\n\n[ INFO ] Script started at `date +%Y%m%d-%H:%M`\n"

echo -e "${GREEN}
========================================================
||                                                    ||
||                 DUMPING DATABASES                  ||
||                                                    ||
========================================================
${NC}"

if [ `date +%d` -eq 01 ]; then
  MONTHLY="True"
fi

function backup {
    echo -e "${GREEN}[ INFO ] `date +%H:%M:%S` Backing up: $dbname.${NC}"
    FILENAME="$dbname/$dbname"_`date +%Y%m%d-%H%M`.pg_dump.gpg
    if sudo -u postgres pg_dump -b -p 5432 -Fc "$dbname" | 
    	/usr/bin/gpg --batch --yes --no-use-agent -c --cipher-algo AES256 --passphrase-file "/home/backupuser/.conf/gpg_passphrase" -z 5 | 
    	/usr/local/bin/aws s3 cp - ${S3COMMONPATH}${FILENAME}
    then
      echo -e "${GREEN}[ INFO ] `date +%H:%M:%S` $dbname dumped successfully${NC}"
    else
      echo -e "${RED}[ FAIL ] `date +%H:%M:%S` $dbname pgdump failed!${NC}"
    fi

    if [ ${MONTHLY} = True ]; then
      if /usr/local/bin/aws s3 cp ${S3COMMONPATH}${FILENAME} ${S3MONTHLYPATH}$dbname/; then
        echo -e "${GREEN}[ INFO ] `date +%H:%M:%S` $dbname copied successfully to monthly storage${NC}"
      else
        echo -e "${RED}[ FAIL ] `date +%H:%M:%S` $dbname copy to monthly storage failed!${NC}"
      fi
    fi
}

while read -r dbname; do
    backup &
done <<< "$DATABASES"
wait

echo -e "${GREEN}\n\n[ INFO ] Script done at `date +%Y%m%d-%H:%M`\n"

Now place the script somewhere and give it “executable” permission using chmod 755 /path/to/your/script.sh

Now, edit the crontab using sudo -u %USERNAME% crontab -e where %USERNAME% is your backup user which has aws-cli configured with the special permissions keys allowing it only to put backups to the buckets because we’d like to avoid a complete disaster if these creds got compromised, right? Also we’re lowering the process priority executing nice just in case someone is using the server and we don’t want to freeze database response by our backup process.

0 0 * * * bash -H -c 'nice -10 /path/to/your/script.sh >> /var/log/backupdb.log'

And the last thing: don’t forget to install gpg and check the path to it and change if necessary in the script.