Creating a URL shortening service with Django

View it live here or get the code here

The first URL shortening site I saw was several years ago and was called TinyURL. Soon after Twitter gained popularity a whole slew of them popped up (bitly, tiny.cc, is.gd) to cater for the masses constrained by Twitters 140 character limit, but a lot shut down because it is fairly hard to monetize and Twitter shortens URL's now, making them a bit pointless.

At its heart a URL shortening service is simply a database that maps a short string to a URL, not exactly rocket science to create. Below I will give a basic walk through guide to creating one yourself. Knowledge of Django and Python is assumed.

Start by making a project and creating our models:

#models.py
from django.db import models
import string

_char_map = string.ascii_letters+string.digits

def index_to_char(sequence):
    return "".join([_char_map[x] for x in sequence])


class Link(models.Model):
    link = models.URLField()
    hits = models.IntegerField(default=0)

    def __repr__(self):
        return "<Link (Hits %s): %s>"%(self.hits, self.link)

    def get_short_id(self):
        _id = self.id
        digits = []
        while _id > 0:
            rem = _id % 62
            digits.append(rem)
            _id /= 62
        digits.reverse()
        return index_to_char(digits)

    @staticmethod
    def decode_id(string):
        i = 0
        for c in string:
            i = i * 64 + _char_map.index(c)
        return i

This model is fairly basic - it stores a URL and a integer representing the number of times the link has been clicked (we will write the view for this later). The get_short_id() method returns the character representation of the ID - we have 62 possible characters (a-z A-Z 0-9) so we convert the number to base 62 and map the digits to characters in our alphabet. This means we can give visitors URL's like http://mylinksite/abcde and the abcde portion of the URL will hold the link ID. This looks a lot nicer than just using the numeric ID in the URL.

Now create a simple template. The {% if %} statements are there so we can display the generated URL to the user

<html>
    <body>
        <h1>Enter a URL to shorten:</h1>
        <form method="POST" action="/">
            <input type="text" name="url" placeholder="Enter a URL">
            <input type="submit">
        </form>
        {% if short_url %}
           <b>URL shortened:</b> {{ short_url }}
        {% endif %}
    </body>
</html>

Now we need to make the view. This is pretty simple: we take the URL provided by the user in the "url" POST field and store it in the database and return a URL to be displayed to the user. Code that ensures the input is valid and present is not included for brevity.

#views.py
from tiny_link import models
from django.shortcuts import render_to_response
from django.template import RequestContext

def home(request):
   short_url = None
   if request.method == "POST":
      link_db = models.Link()
      link_db.link = request.POST.get("url")
      link_db.save()
      short_url = request.build_absolute_uri(link_db.get_short_id())
   return render_to_response("index.html", 
                             {"short_url":short_url}, 
                             context_instance=RequestContext(request))

Simple! Now we will make a quick view to redirect the user:

#views.py
from django.shortcuts import redirect, get_object_or_404
from django.db.models import F

def link(request, id):
   db_id = models.Link.deocde_id(id)
   link_db = get_object_or_404(models.Link, id=db_id)
   models.Link.objects.filter(id=db_id).update(hits=F('hits')+1)
   return redirect(link_db.link)

This view is pretty simple - we decode the given string (e.g abcd) into an integer, and we use that to get the link from the database. After that we issue a UPDATE statement that increments the hits by 1 (this should be done at the database level else increments might get lost due to concurrent updating of the model with stale data), and then we send the user on their merry way.

Now edit your URL file to serve the views:

from django.conf.urls import patterns, include, url

urlpatterns = patterns('',
    url(r'^$', 'tiny_link.views.home'),
    url(r'^(?P<id>[a-zA-Z0-9])$', 'tiny_link.views.link'),
    url(r'^(?P<id>[a-zA-Z0-9])/stats$', 'tiny_link.views.stats'),
)

And thats a wrap. The code linked at the top of the page has a few more bells and whistles, including per-day visitor tracking and graphs (soon), but the core is the same. In my opinion creating a URL shortener is more simple than the canonical "create a blog" introduction project.

Using a custom SQLAlchemy Users model with Django

I really dislike Django's ORM. For my job I have written (and continue to maintain) a large internal project that uses Django's ORM, templating language and MVC framework to serve requests, and I made the unfortunate mistake of sticking with Django's ORM instead of using the much more powerful SQLAlchemy.

The one nice thing about Django's ORM is that it is easy, but that comes at the price of efficiency and power. For example the ability to add more than one record at a time to the database was only just added in Django 1.4, before that if you wanted to insert say 100 models Django would execute 100 INSERT queries, followed by a checkpoint if you were inside a transaction - the result being ~200 queries when 2 would have sufficed. This isn't to say that Django's ORM is bad, its just not right for me.

Anyway, I recently started a new project for my company which is based on Django. I wasn't going to make the same mistake twice so I used SQLAlchemy instead of Django's ORM, but I ran into a few problems - Django's ORM is tightly integrated into Django's users framework (Django ships with a default User class that can't be edited - you can expand it but that requires a one-to-one join on another table), and I needed a way to tie in my SQLAlchemy Users's model into Django's authentication system. Thankfully this was a lot easier than I thought, thank's to Django's modular design and easy to read codebase.

Django's User class has a few functions that we need to implement in our new User's class to be 100% compatible: is_authenticated, is_anonymous, check_password and set_password. For the password functions we can use Django's excellent make_password and check_password functions, and for the authentication functions we simply return True and False respectively. We also need to disconnect the update_last_login handler because it is incompatible with SQLAlchemy. You could re-write it if you wanted though.

So, lets jump right into it. Define yourself a Users class in your models.py (imports excluded for brevity)

from django.contrib.auth.models import update_last_login, user_logged_in
user_logged_in.disconnect(update_last_login)

class User(Base):
    __tablename__ = "users"
    id       = Column(Integer, primary_key=True)
    username    = Column(String, unique=True)
    salt     = Column(String(10))
    password = Column(String(128))

    def is_authenticated(self):
        return True

    def is_anonymous(self):
        return False

    def check_password(self, raw_password):
        #TODO: Make this auto update using 
        # check_passwords setter argument
        return check_password(raw_password, self.password)

    def set_password(self, password):
        if not self.salt:
            self.salt = random_characters(10)
        self.password = make_password(password,salt=self.salt)

Now create a new authentication backend and call it SQLAlchemyAuthenticationBackend.py:

from sqlalchemy.orm.exc import NoResultFound
from Overseer import models

class SQLAlchemyUserBackend(object):
    supports_anonymous_user = True
    supports_inactive_user = True

    def __init__(self):
        self.session = models.Session()

    def authenticate(self, username=None, password=None):
        try:
            user = self.session.query(models.User).filter_by(username=username).one()
            if user.check_password(password):
                return user
        except NoResultFound:
            return None

    def get_user(self, user_id):
        try:
            user = self.session.query(models.User).filter_by(id=user_id).one()
        except NoResultFound:
            return None

        return user

And edit your settings.py to include this backend:

AUTHENTICATION_BACKENDS =     ('path.to.SQLAlchemyAuthenticationBackend.SQLAlchemyUserBackend',)

And you are done. When you reference request.user it should now be your custom User class and not Django's. This also works nicely with the login decorators and even the default contrib.auth.login/logout views. It doesn't currently support user permissions simply because I don't need them, but they could be coded in fairly easily - or though they might be a bit to ingrained into Django's ORM to work with SQLAlchemy.

There might be some issues I haven't found with this, and if I do find any I will update this post, but for now it seems to be working fine. God, I love Duck Typing.

Draconian internet filters

My universities student network is pretty restricted. I just finished coding a few changes to Simple and realised I couldn't push any changes to GitHub due to port restrictions. It appears that they block almost all ports bar 80 and 445 via TCP, which is fine for most users but is quite annoying for me - I often need to SSH into one of my servers or use non-standard ports.

After some investigating I discovered that I could SSH to the student run linux cluster Freeside, so I went and got and signed up for an account and got myself a shell. At first I figured that I could just set up a SSH tunnel from my PC to freeside and then proxy my traffic through freeside, allowing me unfiltered access to the internet. However the freeside servers appear to have some form of filtering on them as well, or though not as much - I can SSH out (so I can deploy to github from there) but I can't forward OpenVPN traffic through it (which is critical for my work), so I need to set up a SSH tunnel from freeside to one of my servers in Germany which is fully unrestricted, allowing me to run OpenVPN through the tunnel.

Feeding traffic through multiple SSH tunnels using PuTTY

I run Windows on my laptop to develop and game on. Because of this I have no native SSH client, so I use the fantastic PuTTY - I suggest you go download it if you do not already have it.

Open up PuTTY and enter freeside.co.uk as the hostname and 22 as the port, then navigate to the tunnels settings page (under Connections->SSH). Enter 1000 as the source port and the destination as localhost:10000. This will forward any traffic going through port 1000 on our machine to port 10000 on the connected host. Navigate back to the session tab and press Save (very important, I always forget to do this and loose any changes I have made). Click connect.

Once you have connected to Freeside and logged in using the username and password that you signed up with then run the following command in the terminal: cat ssh -D 10000 username@yourserver.com -p 22 > tunnel.sh Replace yourserver.com with your remote servers IP or hostname - I suggest you go buy a VPS from somewhere and use that. Replace the username with the username on the remote host. This will create a new file called tunnel.sh which when executed will create another SSH tunnel listening on port 10000 and forwarding all traffic through yourserver.com. Make sure you can execute it by running the following command: chmod +x tunnel.sh And voila, you now have a simple SSH chain set up. Before you want to proxy any traffic open up putty and SSH into freeside, once there execute the tunnel script by running sh tunnel.sh and login with the correct details when prompted. Now simply point OpenVPN or TortoiseGIT to use the SOCKS proxy running locally on your machine, listening on port 1000, and any traffic you send will be bounced through Freeside, your server then the rest of the world.

Simple.

I like things to be simple. So I wrote my own blog software to replace the rather un-simple WordPress. Its not that WordPress its hard to use or install, far from it, Its just got a lot of bloatware in my opinion, so I replaced it with Simple.

Simple uses MarkDown to format posts, an aims to be as simple as possible. It consists of a single Python file with a few external resources (css, js and templates), and it has very few dependencies. The footprint on the server is incredibly low and the response time is better than that of WordPress running on Apache. Best of all it doesn't require some big database server like MySQL or PostgreSQL, it runs off a simple Sqlite database file.

When you type a post you appear to type directly on the page, there is no annoying WYSIWYG editor to get in the way: Draft

Posts are arranged into two groups: drafts and non-drafts. You can slowly start work on several ideas at once, and once they form into proper posts publish them to the frontpage.

Its got some nice code highlighting as well, which was one thing that was annoying me with WordPress:

import math
def test(x,y):
         return (x+y) + math.sqrt(x)

So yeah. This is one of my new projects.