Hurtling through Space

Backing up Jira and Confluence Servers

30 September 2018

Show BibTeX citation ↴

Skip down to Actual Strategy if you just want the technical info about setting up crontab scripts. I’m going to talk about Jira and work philosophy first.

I run my own personal Jira and Confluence instances on home server. I got reliant on Jira at Udacity. I actually think it’s fantastic. With lots of projects always going on at work and at home, it helps me wade through the chaos to find out what exactly I should be working on.

The Jira + Confluence combo is kind of overkill for a personal setup given the focus on enterprise and the insane level of customization available. But I enjoy the level of control it affords. And price-wise it works out great. With the server editions, it’s $10 for lifetime access to Jira software and an additional $10 for access to Confluence. This isn’t bad given that most hosted options will charge at least a few dollars a month for similar capabilities. Of course, it also helps that I already have a capable server running 24/7, which makes server hardware a sunk cost in this equation.

However, the one thing missing is cloud storage and backup. I back Jira and Confluence with a database running on the same machine (just a PostgreSQL container). If something were to happen to my server, all would be lost. When I first spun up my Jira and Confluence instances, I was aware of this fact and accepted it. I’m making an effort to make more intelligent (lazy) decisions as a software engineer. In my earlier days, I loved premature optimizations. I would jump at any opportunity to design and build what I would have considered a robust and thorough system as early in a project as possible. I wasted time building and planning for features I imagined I might need two or three iterations later. I recently poked through code I wrote a few years ago and I saw evidence of this everywhere. Modules and classes that exist because “I may want this later.” Logic split between packages in half-assed attempts to make code DRYer and more modular. Logging systems that never needed to exist and that actually make it harder to find errors.

Nowadays, I do as little as possible. I was fine leaving my Jira and Confluence data in a precarious situation because it would take effort to back them up and I didn’t know if that effort was worth it at the outset. I easily could have abandoned Jira, in which case, why bother backing it up?

I decided that backing up the data was worth the effort this week. I found myself reaching for a Confluence page to jot down notes I most definitely did not want to risk losing. And I have almost 100 Jira issues across four projects actively tracked (and all assigned to me!). Now that I not only have important information there but I’m becoming reliant on it, I need to make sure my life won’t spiral out of control if, say, my adorable husky found a way to knock my server off the bureau in my office. Here’s what I came up with.

Actual Strategy

If you follow the same path as me, you’ll get a quick intro to the awscli tool and crontab, both of which are super useful.

Repo here.
Official Jira and Confluence backup advice.

I want to back up everything that I would need to restart Jira or Confluence from scratch. This includes the database as well as data / attachment folders. I’m using AWS S3 as my storage mechanism.

Once a day, a cronjob runs on my server. See the repo for the full scripts and installation instructions. The gist of it is:

#!/bin/bash

# sync_jira.sh

DB_DUMP=$(date +db_%Y%m%d.gz)
DATA_DIR_ZIP=$(date +data_%Y%m%d.tar.gz)

# Pipe the database dump straight into gzip and awscli
pg_dump -h localhost -U jira jira | gzip | ~/.local/bin/aws s3 cp - s3://[BUCKET]/jira/temp/$DB_DUMP

# tar and gzip Jira's data directory and pipe it to awscli
tar -cv /var/atlassian/application-data/jira/data | gzip | ~/.local/bin/aws s3 cp - s3://[BUCKET]/jira/temp/$DATA_DIR_ZIP

Credit to loige.co for the tip about using - with awscli to pipe straight from stdout to S3. I really appreciate that there’s no need to deal with temporary files here. Obviously replace [BUCKET] with your S3 bucket. Here’s a handy dandy tip for running find and replace across multiple files simultaneously.

I run the scripts around 1am every day via crontab, like so:

00 01 * * * /etc/cron.d/sync_jira.sh

If you’re writing a cronjob, you’ll want to run it manually to test it. Check out this great SO post for an easy strategy to run cronjobs manually with the same environment crontab will have.

I don’t want old copies of my backups to build up ad infinitum, so I’m using S3 object expiration to automatically remove objects after 90 days. awscli documents an --expires flag, but in my 5 minutes of testing I was unable to apply it successfully. Regardless, the S3 console lets you manually set expirations by matching object prefixes. I match on a [jira|confluence]temp/ prefix and set these objects to expire after 90 days. (To be clear, AWS doesn’t let you regex prefixes AFAIK.) This means that I would lose any chance of retrieving my data if my server goes 90+ days without uploading new backups to S3. To ensure that that never happens, I run a separate cron job afterwards to overwrite a non-expirable “latest” backup with the newest copy.

#!/bin/bash

DB_DUMP=$(date +db_%Y%m%d.gz)
DATA_DIR_ZIP=$(date +data_%Y%m%d.tar.gz)

# copy newest files to latest position to avoid expiration
~/.local/bin/aws s3 cp s3://[BUCKET]/jira/temp/$DB_DUMP s3://[BUCKET]/jira/latest_db.gz
~/.local/bin/aws s3 cp s3://[BUCKET]/jira/temp/$DATA_DIR_ZIP s3://[BUCKET]/jira/latest_data.tar.gz

This job is run separately because it takes S3 a few moments to register the newest uploads. If you attempt to copy to “latest” too quickly, S3 will throw an error about unknown files. I’m sure you could throw a sleep in there after the initial upload in order to copy everything in the same script, but that feels overcomplicated. I just run these copy commands as a follow up script an hour later to avoid any problems.

This last bit about dealing with expirations might be overkill given what I was saying earlier about premature optimizations. However, I never want to touch this again and I don’t want to worry about an ever-growing AWS bill :)

And that’s it! Now I’m off to go check off the Jira task I made for this blog post.

How to cite this blog post:

@misc{Pittman20189B,
    author = {Pittman, Cameron},
    title = {Backing up jira and confluence servers},
    journal = {Hurtling through Space},
    url = {},
    year = {2018},
    month = {September},
    accessed = {Oct 17, 2022}
}

The source code of this project is covered by the MIT license.
The content of this project (eg. blog posts) is covered by the CC BY-SA 4.0 license.