Getting started with boto and Glacier

Amazon recently released Glacier, a new web service designed to store rarely accessed data. Thanks to boto, a Python interface to Amazon Web Services, it's very easy to store/retrieve archives from Glacier.

If you have never heard about Amazon Glacier you should read the Amazon Glacier FAQ and the Amazon Glacier developer guide.

The basics

With Glacier, a backuped file is an archive stored in a vault. To make an analogy with Amazon S3, an archive is like a key and a vault is like a bucket.

To download an archive, and even to get the inventory, you must first initiate a job that will complete within 3-5 hours, you can optionally get notified via the Amazon Simple Notification Service, then you can download the result.

Also, Amazon specify that you should maintain your own inventory.

Getting started with boto

Here is the strict minimum to store/retrieve an archive. You should also check the API Reference.

import boto


# boto.connect_glacier is a shortcut return a Layer2 instance 
glacier_connection = boto.connect_glacier(aws_access_key_id=ACCESS_KEY_ID,

vault = glacier_connection.create_vault("myvault")

# Uploading an archive
# ====================

# You must keep track of the archive_id
archive_id = vault.upload_archive("mybackup.tgz")

# Retrieving an archive
# =====================

# You must initiate a job to retrieve the archive
retrieve_job = vault.retrieve_archive(archive_id)

# or if the job is pending (with job_id =
# retrieve_job = vault.get_job(job_id)

# You can check if the job is completed either manually, or via Amazon SNS
if retrieve_job.completed:

That's it !

Keeping track of the inventory

I chosed to use shelve to store both the inventory and waiting jobs.

Here is a simple class that can help you getting started:

(gist available here)

# encoding: utf-8
import os
import shelve
import boto.glacier
import boto
from boto.glacier.exceptions import UnexpectedHTTPResponseError

SHELVE_FILE = os.path.expanduser("~/.glaciervault.db")

class glacier_shelve(object):
    Context manager for shelve

    def __enter__(self):
        self.shelve =

        return self.shelve

    def __exit__(self, exc_type, exc_value, traceback):

class GlacierVault:
    Wrapper for uploading/download archive to/from Amazon Glacier Vault
    Makes use of shelve to store archive id corresponding to filename and waiting jobs.

    >>> GlacierVault("myvault")upload("myfile")
    >>> GlacierVault("myvault")retrieve("myfile")

    or to wait until the job is ready:
    >>> GlacierVault("myvault")retrieve("", True)
    def __init__(self, vault_name):
        Initialize the vault
        layer2 = boto.connect_glacier(aws_access_key_id = ACCESS_KEY_ID,
                                    aws_secret_access_key = SECRET_ACCESS_KEY)

        self.vault = layer2.get_vault(vault_name)

    def upload(self, filename):
        Upload filename and store the archive id for future retrieval
        archive_id = self.vault.create_archive_from_file(filename, description=filename)

        # Storing the filename => archive_id data.
        with glacier_shelve() as d:
            if not d.has_key("archives"):
                d["archives"] = dict()

            archives = d["archives"]
            archives[filename] = archive_id
            d["archives"] = archives

    def get_archive_id(self, filename):
        Get the archive_id corresponding to the filename
        with glacier_shelve() as d:
            if not d.has_key("archives"):
                d["archives"] = dict()

            archives = d["archives"]

            if filename in archives:
                return archives[filename]

        return None

    def retrieve(self, filename, wait_mode=False):
        Initiate a Job, check its status, and download the archive when it's completed.
        archive_id = self.get_archive_id(filename)
        if not archive_id:
        with glacier_shelve() as d:
            if not d.has_key("jobs"):
                d["jobs"] = dict()

            jobs = d["jobs"]
            job = None

            if filename in jobs:
                # The job is already in shelve
                job_id = jobs[filename]
                    job = self.vault.get_job(job_id)
                except UnexpectedHTTPResponseError: # Return a 404 if the job is no more available

            if not job:
                # Job initialization
                job = self.vault.retrieve_archive(archive_id)
                jobs[filename] =
                job_id =

            # Commiting changes in shelve
            d["jobs"] = jobs

        print "Job {action}: {status_code} ({creation_date}/{completion_date})".format(**job.__dict__)

        # checking manually if job is completed every 10 secondes instead of using Amazon SNS
        if wait_mode:
            import time
            while 1:
                job = self.vault.get_job(job_id)
                if not job.completed:

        if job.completed:
            print "Downloading..."
            print "Not completed yet"


You may also want to check out bakthat, a Python tool I wrote, that allow you to compress, encrypt (symmetric encryption) and upload files directly to Amazon S3/Glacier, you can use it either via command line, or as a python module.

Your feedback

Don't hesitate if you have any questions !

You should follow me on Twitter

Share this article

Tip with Bitcoin

Tip me with Bitcoin and vote for this post!


Leave a comment

© Thomas Sileo. Powered by Pelican and hosted by DigitalOcean.