Seamlessy dump, encrypt and compress your database to the cloud
07.07.2019 - AYB - Reading time ~1 Minute
There are a lot of places where using cloud databases isn’t a production solution for various reasons. We’re not going to discuss that, but will backup the Postgres database to the cloud and encrypt it on the fly. There is no difference what type of DB to dump — we will use Postgres just as an example.
There are several common options on how to backup database:
- make a complete database dump to file
- backup database’s transaction log between full dumps
For really large databases the second option could be the only way to not get broke by the storage bills, but for databases less than 30GB in size a daily full dump is the best solution.
Enough smalltalk, let’s go with a Bash script that will be executed by cron daemon at the desired least busy time.
I, personally, like the colored output. That allows me to quickly overwatch the log for the red signs to see exactly where something has gone wrong. I use coloring even for the small scripts just in sake of sameness.
#!/bin/bash
# set -x
RED='\033[0;31m'
GREEN='\033[0;32m'
BLUE='\033[0;34m'
NC='\033[0m'
Then we’re going to declare some variables. Note that we’re declaring a variable containing a list of databases which names are beginning with a “prod” string by executing a simple cli-query against the database server. Google out the same command for your own DB engine to have this feature in your script.
TIME=`date +%H:%M:%S`
S3COMMONPATH="s3://pg-backup.prod.domain.xyz/"
S3MONTHLYPATH="s3://pg-backup.monthly.prod.domain.xyz"
FILENAME=""
MONTHLY="False"
DATABASES=$(sudo -u postgres psql -qAtX -c "SELECT datname FROM pg_database WHERE datname LIKE 'prod%';")
Here the action starts. First of all we echoing the time script started working and a banner to make it easy to separate different stages if you’ll get this script extended. Then we estimating isn’t it a time to make a monthly copy to a separate bucket and storing the result in the variable.
echo -e "${GREEN}\n\n[ INFO ] Script started at `date +%Y%m%d-%H:%M`\n"
echo -e "${GREEN}
========================================================
|| ||
|| DUMPING DATABASES ||
|| ||
========================================================
${NC}"
if [ `date +%d` -eq 01 ]; then
MONTHLY="True"
fi
Then the function trick comes in making available a separate background process for each DB at the server, but be careful — multiple parallel dumps may make server falling on its knees while processing.
This function is being launched as a background process for every database found at the server.
The main trick is extensive use of pipes avoiding the storing and processing backups locally which may lead to errors and server crashes due to lack of space.
First of all, we launching everything in a try-catch cycle. Because we want to know things gone wrong.
Let’s go over first if
cycle:
- we’re calling the
pg_dump
for the DBname given and piping it to the gpg
application that will encrypt everything piped bypg_dump
and compress it on the go. I’ve found that zipping usingpg_dump
costs a lot more CPU than usinggpg
. Thengpg
piping output directly to theaws-cli
which is storing the input at the desired S3 bucket using the$FILENAME
variable set.
At the second part of the function, we’re checking the state of $MONTHLY
variable and if true — copying the resulting file from one bucket to another using aws-cli
.
function backup {
echo -e "${GREEN}[ INFO ] `date +%H:%M:%S` Backing up: $dbname.${NC}"
FILENAME="$dbname/$dbname"_`date +%Y%m%d-%H%M`.pg_dump.gpg
if sudo -u postgres pg_dump -b -p 5432 -Fc "$dbname" |
/usr/bin/gpg --batch --yes --no-use-agent -c --cipher-algo AES256 --passphrase-file "/home/backupuser/.conf/gpg_passphrase" -z 5 |
/usr/local/bin/aws s3 cp - ${S3COMMONPATH}${FILENAME}
then
echo -e "${GREEN}[ INFO ] `date +%H:%M:%S` $dbname dumped successfully${NC}"
else
echo -e "${RED}[ FAIL ] `date +%H:%M:%S` $dbname pgdump failed!${NC}"
fi
if [ ${MONTHLY} = True ]; then
if /usr/local/bin/aws s3 cp ${S3COMMONPATH}${FILENAME} ${S3MONTHLYPATH}$dbname/; then
echo -e "${GREEN}[ INFO ] `date +%H:%M:%S` $dbname copied successfully to monthly storage${NC}"
else
echo -e "${RED}[ FAIL ] `date +%H:%M:%S` $dbname copy to monthly storage failed!${NC}"
fi
fi
}
This is the final part of the script which is a kind of “main” function. Here we’re iterating over the $DATABASES
variable and launching background processes for each. Note that process called is being sent to the background if &
is placed at the end of the command, and wait
after the iterator will tell the script that it should wait until all background processes are finished before finishing its job.
And at the end, we’re echoing the finish time of the script.
while read -r dbname; do
backup &
done <<< "$DATABASES"
wait
echo -e "${GREEN}\n\n[ INFO ] Script done at `date +%Y%m%d-%H:%M`\n"
The final batch:
#!/bin/bash
# set -x
RED='\033[0;31m'
GREEN='\033[0;32m'
BLUE='\033[0;34m'
NC='\033[0m'
TIME=`date +%H:%M:%S`
S3COMMONPATH="s3://pg-backup.prod.domain.xyz/"
S3MONTHLYPATH="s3://pg-backup.monthly.prod.domain.xyz"
FILENAME=""
MONTHLY="False"
DATABASES=$(sudo -u postgres psql -qAtX -c "SELECT datname FROM pg_database WHERE datname LIKE 'prod%';")
echo -e "${GREEN}\n\n[ INFO ] Script started at `date +%Y%m%d-%H:%M`\n"
echo -e "${GREEN}
========================================================
|| ||
|| DUMPING DATABASES ||
|| ||
========================================================
${NC}"
if [ `date +%d` -eq 01 ]; then
MONTHLY="True"
fi
function backup {
echo -e "${GREEN}[ INFO ] `date +%H:%M:%S` Backing up: $dbname.${NC}"
FILENAME="$dbname/$dbname"_`date +%Y%m%d-%H%M`.pg_dump.gpg
if sudo -u postgres pg_dump -b -p 5432 -Fc "$dbname" |
/usr/bin/gpg --batch --yes --no-use-agent -c --cipher-algo AES256 --passphrase-file "/home/backupuser/.conf/gpg_passphrase" -z 5 |
/usr/local/bin/aws s3 cp - ${S3COMMONPATH}${FILENAME}
then
echo -e "${GREEN}[ INFO ] `date +%H:%M:%S` $dbname dumped successfully${NC}"
else
echo -e "${RED}[ FAIL ] `date +%H:%M:%S` $dbname pgdump failed!${NC}"
fi
if [ ${MONTHLY} = True ]; then
if /usr/local/bin/aws s3 cp ${S3COMMONPATH}${FILENAME} ${S3MONTHLYPATH}$dbname/; then
echo -e "${GREEN}[ INFO ] `date +%H:%M:%S` $dbname copied successfully to monthly storage${NC}"
else
echo -e "${RED}[ FAIL ] `date +%H:%M:%S` $dbname copy to monthly storage failed!${NC}"
fi
fi
}
while read -r dbname; do
backup &
done <<< "$DATABASES"
wait
echo -e "${GREEN}\n\n[ INFO ] Script done at `date +%Y%m%d-%H:%M`\n"
Now place the script somewhere and give it “executable” permission using chmod 755 /path/to/your/script.sh
Now, edit the crontab using sudo -u %USERNAME% crontab -e
where %USERNAME% is your backup user which has aws-cli configured with the special permissions keys allowing it only to put backups to the buckets because we’d like to avoid a complete disaster if these creds got compromised, right?
Also we’re lowering the process priority executing nice
just in case someone is using the server and we don’t want to freeze database response by our backup process.
0 0 * * * bash -H -c 'nice -10 /path/to/your/script.sh >> /var/log/backupdb.log'
And the last thing: don’t forget to install gpg and check the path to it and change if necessary in the script.