If like me, you want to backup everything you ever published, it makes sense to backup your tweets. Camlistore, also known as personal storage system for life seems a perfect fit for the job.
I chose to use Python since I recently created a Python client for Camlistore, Camlipy and I already played with Twitter API using requests (see using Twitter API v1.1 with Python).
Requirements
$ sudo pip install request request_oauthlib camlipy
Twitter API
Here is the Twitter API OAuth part, from my previous article, you must set your API keys and run the script to generate your OAuth tokens.
import json
from requests_oauthlib import OAuth1
from urlparse import parse_qs
import sys
import datetime
import locale
import requests
REQUEST_TOKEN_URL = "https://api.twitter.com/oauth/request_token"
AUTHORIZE_URL = "https://api.twitter.com/oauth/authorize?oauth_token="
ACCESS_TOKEN_URL = "https://api.twitter.com/oauth/access_token"
CONSUMER_KEY = "XXXXXXXXX"
CONSUMER_SECRET = "XXXXXXXXX"
OAUTH_TOKEN = ""
OAUTH_TOKEN_SECRET = ""
TWITTER_API_TIMELINE = 'https://api.twitter.com/1.1/statuses/user_timeline.json'
def setup_oauth():
"""Authorize your app via identifier."""
# Request token
oauth = OAuth1(CONSUMER_KEY, client_secret=CONSUMER_SECRET)
r = requests.post(url=REQUEST_TOKEN_URL, auth=oauth)
credentials = parse_qs(r.content)
resource_owner_key = credentials.get('oauth_token')[0]
resource_owner_secret = credentials.get('oauth_token_secret')[0]
# Authorize
authorize_url = AUTHORIZE_URL + resource_owner_key
print 'Please go here and authorize: ' + authorize_url
verifier = raw_input('Please input the verifier: ')
oauth = OAuth1(CONSUMER_KEY,
client_secret=CONSUMER_SECRET,
resource_owner_key=resource_owner_key,
resource_owner_secret=resource_owner_secret,
verifier=verifier)
# Finally, Obtain the Access Token
r = requests.post(url=ACCESS_TOKEN_URL, auth=oauth)
credentials = parse_qs(r.content)
token = credentials.get('oauth_token')[0]
secret = credentials.get('oauth_token_secret')[0]
return token, secret
def get_oauth():
oauth = OAuth1(CONSUMER_KEY,
client_secret=CONSUMER_SECRET,
resource_owner_key=OAUTH_TOKEN,
resource_owner_secret=OAUTH_TOKEN_SECRET)
return oauth
if not OAUTH_TOKEN:
token, secret = setup_oauth()
print "OAUTH_TOKEN: " + token
print "OAUTH_TOKEN_SECRET: " + secret
print
sys.exit()
Now we can call the Twitter API, we'll hit the statuses/user_timeline endpoint (check out working with timelines if needed).
def fetch_tweets(since_id=None, max_id=None, count=200):
""" Fetch tweets. """
params = {'count': count}
if since_id:
params['since_id'] = since_id
if max_id:
params['max_id'] = max_id
oauth = get_oauth()
r = requests.get(url=TWITTER_API_TIMELINE,
params=params, auth=oauth)
return r.json()
Here is the backup_timeline
function that automatically handle the max_id
paramaters since we can only retrieve 200 tweets by API call.
def backup_timeline(since_id=None, max_id=None, count=200):
tweets = []
while 1:
batch = fetch_tweets(since_id=since_id, max_id=max_id, count=count)
tweets.extend(batch)
if tweets:
max_id = tweets[-1]['id']
if len(batch) < count:
break
return tweets
Now we can fetch our timeline (it will return only the first 3200 tweets, this is the API limit).
tweets = backup_timeline()
And we need to retrieve the most recent id for the next time we will backup.
since_id = tweets[0]['id']
Check that backup_timeline
result is empty with since_id
.
backup_timeline(since_id=since_id)
Getting started with Camlistore and Camlipy
If you have never heard about Camlistore before, you should read the website, and the project overview. You should also check out Camlipy documentation.
I am assuming you have at least a little knowledge about how Camlistore works.
Each blob is identified by its unique blobref. A blobref looks like sha1-25f2b42fae7398bc8857ed17d56d7d1e072c9832.
from camlipy import Camlistore
c = Camlistore('http://localhost:3179')
First, we need to create a permanode, it will hold the static set blobref reference, since we will create a new static set each time we perform a backup.
p = c.permanode_by_title('twitter_backups', create=True)
p
Next, we create a new static set, which will contain each tweets blobrefs (most recent first), with the add_to_static_set
helper, and we add this set to a set.
s = c.add_to_static_set([c.put_blob(json.dumps(tweet)) for tweet in tweets])
p.set_camli_content(c.add_to_static_set([s]))
Finally we can store the newest id, since_id
for the next search, in a permanode attribute (permanode attribute must be str), so we create a new claim.
p.set_attr('since_id', str(since_id))
p.get_attr('since_id')
The final script
Without Twitter API code.
# Fetch the permanode
p = c.permanode_by_title('twitter_backups')
# Retrieve since_id if existing
since_id = int(p.get_attr('since_id'))
# Try to retrieve the existing static set
static_set_blobref = p.get_camli_content()
# A static set already exists, so we load it
if static_set_blobref:
static_set = c.static_set(static_set_blobref)
else:
# We create a new static set
static_set = c.static_set()
# Update/set the new tweets blobrefs
tweets_blobrefs = [c.put_blob(json.dumps(tweet)) for tweet in backup_timeline(since_id=since_id)]
if tweets_blobrefs:
new_static_set_blobref = static_set.update([c.add_to_static_set(tweets_blobrefs)])
p.set_camli_content(new_static_set_blobref)
Accessing tweets
p2 = c.permanode_by_title('twitter_backups', create=True)
s2 = c.static_set(p2.get_camli_content())
To retrieve the tweets starting from the most recent, we must iterate the list upside down.
tweets = []
for static_set_blobref in s2.members[::-1]:
static_set = c.static_set(static_set_blobref)
for tweet_data in static_set.members:
tweet = json.loads(c.get_blob(tweet_data).read())
tweets.append(tweet)
tweets[0]
Conclusion
Don't hesitate to let me know if you have any questions/suggestions !
Tip with Bitcoin
Tip me with Bitcoin and vote for this post!
Leave a comment