Hack things, acquire clothing.

I discovered an XSS flaw in a website a month ago and reported it to the owners. As a thank you they sent me a hat, a rather large american sized t-shirt and a pair of "DeFeet" socks (guaranteed to stay cooler and drier than any other brand). I didn't expect them to ship something like that overseas for a fairly simple issue, but it was nice that they did.

If you want something more bankable for finding an issue with a website then there are hundreds of sites that offer cash for vulnerabilities, called "Bug bounty programs". I found a list of such sites and picked one at random to see if I could find anything and after half a days work I discovered a high importance issue which would have netted me $500 if it were not for the fact that the bug bounty program ended late last year. The issue I reported did however get its own CVE number (CVE-2013-4429) which is pretty cool.

Bug bounty programs can be a nice bit of side income if you have some spare time, but always make sure that they are currently running a program.

The loot:

Restricting Thrift clients to specific IP addresses with Twisted

Apache Thrift is pretty awesome - you can build Twisted bindings for your Thrift interface file that work fantastically. There is one thing that took me a while to figure out: I want to restrict clients connecting to the service to a specific set of IP addresses stored in a database.

There were three ways I could see to do this:

  1. Automate UFW or iptables to restrict access to the port when the list of IP addresses is changed
  2. Make Twisted connect to the database and query it
  3. Make Twisted make a HTTP request to a service which then queries the database

I opted for #3, as #1 is hardly portable and #2 isn't maintainable (for various reasons). Unfortunately Twisted doesn't have built in support for this, so the way I managed it is to check it inside the connectionMade method of the protocol:

class AuthenticatingThriftProtocol(TTwisted.ThriftServerProtocol):
    @defer.inlineCallbacks
    def connectionMade(self):
        self.transport.pauseProducing()
        log.msg("Authenticating host")
        result = yield checkHostAgainstWhitelist( self.transport.getHost() )
        if result == False:
                self.transport.loseConnection()
        else:
                self.transport.resumeProducing()

The transport has to be paused while authenticating to stop clients making requests while the authentication process is in progress (Twisted is event driven, so while checkHostAgainstWhitelist is in progress other data could be sent and processed). Once the IP address has been checked then the connection is either dropped if the IP address is not allowed or the transport is resumed, which will process any buffered data sent by the client while it was being authenticated.

The next step would be to tie it into the cred framework, but that can wait until another time.

Adding tail-call optimization to Python

Tail-call optimization is a trick many languages and compilers use to avoid creating excess stack frames when dealing with recursive code like this:

def call_1000_times(count=0):
    if count == 1000:
        return True
    else:
        return call_1000_times(count + 1)

This function simply calls itself with modified arguments until a condition is met (the count is 1000) at which point it returns True. Because all the function mostly does is return the result of another function call the stack frame does not need to be kept around in memory and can be disposed of or the parents frame could be re-used.

The reference Python implementation (CPython) does not implement tail-call optimization, so running the above code will hit the recursion limit and throw an exception.

Pure python tail-call optimization?

I hacked around for a bit and came up with a pure-python function decorator that will automagically optimize recursive functions like the one below. It appears to be calling itself recursively but its actually doing no such thing else it would hit the recursion limit and explode with an exception:

sys.setrecursionlimit(10)
@tail_call()
def test_tail_count(count=0):
   return count if count == 7000 else test_tail_count(count + 1)
>>> print test_tail_count()
7000

How?

If you are feeling brave you can view the code here on GitHub: https://gist.github.com/orf/41746c53b8eda5b988c5

I developed two different ways of implementing tail-call recursion, both with advantages/disadvantages.

Functools.partial

In the code above the tail_call() wrapper replaces the reference to the test function with a functools.partial object while the function is being evaluated. This allows the wrapper to execute each function sequentially rather than recursively, thus avoiding the recursion limit.

This method well and theoretically requires no code changes to the function itself, but it does have some downsides:

  • You can't use the result of calling test function while inside the test function as its not actually called (yet).
  • It's about 20% slower than the normal recursive method
  • It won't work with multithreaded code, as the reference to the function is replaced while it is executing
Return tuples

This method is actually slightly faster than normal recursive code according to the benchmarks in the gist above. The test_tail_count function would have to be changed to return a tuple of arguments to be passed to the next call rather than pretending to call itself recursively:

@tail_call(tuple_return=True)
def test_tail_count(count=0):
   return count if count == 7000 else (count + 1, )

This has very little overhead as tuples are pretty cheap to create in Python. However it's not as readable as the first method and I haven't worked out how to support keyword arguments yet.

Benchmarks

I ran some quick and dirty benchmarks to see how these performed. Each test calculates the 1700th number of the Fibonacci sequence recursively 1000 times and the total time taken is displayed below. I've found that the greater the fibonacci number the faster the optimized versions are compared to the standard recursive method - I used 1700 but your mileage may vary.

  • test_fib_optimize 2.6015851634
  • test_fib_tuple_optimized 1.83400634784
  • test_fib_no_optimize 1.97867649889

As you can see the tuple return method is slightly faster than the standard recursive method, and the functools.partial method is the slowest.

My Uni's timetable system sucks, so I built a better one.

tl;dr The timetable system sucks, so I made one that works

Getting your timetable sorted at Uni has never been fun. In years 1 and 2 of my study the department posted a timetable for each year showing all modules and students were expected to remove the classes they did not take, which while not the best system it did seem to work fine. However this year the Uni has a timetabling website that you can use to create a timetable for the modules that you take, and in theory this is a good idea - students can connect to the website, enter the modules they take and get a personalized up-to-date timetable that they can use.

The implementation of the timetable website is horrendous for the following reasons:

  1. It's so damn slow - every time you select from the list of modules the page refreshes, which takes between 6 and 8 seconds. If you select anything else in that time it is discarded, and the scrollbar position on the select box is also lost when the page finally finishes loading which is very infuriating.
  2. Filtering modules is broken - If you use it your entire selection is lost and you have to start all over again.
  3. No mobile interface - Most students have a smartphone, and it would be reasonable to assume that they might want to view their timetables while they are mobile and not at a computer. Nope.
  4. No hard links - Once you have generated a timetable you can't bookmark it or link to it in the future, you have to go through the whole annoying process of selecting your modules and making a cup of tea while it loads.
  5. One timetable per module - Instead of displaying the modules you selected in a single timetable (which would make sense) you are instead presented with one for each module, which means you have to manually combine all of them into one (which can be error prone).
  6. It uses sessions - if you leave the page open for too long your session times out. Why does it have sessions? Are they really needed to take input from a from and display it in another page? No.
  7. You can't export your timetable - this means you can't import it into other calendar applications, you would have to do it manually.

I thought systems like this were supposed to make things easier? It seems like nobody who designed the timetabling system put any thought into the user interface.

An alternative

Annoyed by all these issues (and more) I wondered how hard it would be to write an improved version that fixes all of those issues. It turns out not hard at all - you can view it live here: timetables.tomforb.es.

The system I have come up with fixes all of the issues described above:

  1. It's fast: Timetables are created in milliseconds, not seconds.
  2. Filtering modules is awesome: It uses the awesome Select2 library to make filtering modules quick and easy. Module titles are also included in the list.
  3. Mobile: The site is built responsively so it looks good on both desktops and mobiles. If you view a timetable in a mobile interface you get an accordion of days and times, while in a desktop you get a full single timetable containing all of the classes. Try it by re-sizing your browser.
  4. Hard links: You can bookmark your timetable to easily refer to it later.
  5. One big timetable: Lectures are displayed in a single timetable rather than by module.
  6. No sessions: No timeouts.
  7. Data freedom: You can export your timetable in an iCal format which can be imported into most other calendar applications (outlook, thunderbird, google cal etc)

It doesn't contain all the functionality of the existing system, but it seems to be a lot easier to use than the current application. In the future, if people like it I could add functions to hide lectures that have finished, make week numbers relative to the start of freshers week etc.

The point is it's not hard to build a usable interface for any simple system. If an undergrad can hack this together in an hour then the people who designed the Uni's system (presumably for lots of money) have no excuse.

The code can be found here on github

Screenshots

Selecting modules is easy, you can browse or filter by typing

Lectures are displayed in a table, each module having its own colours

Works with mobile devices as well

Purchasing a £30,000 numberplate for the price of a bus ticket

Regtransfers.co.uk is a website that allows you to purchase customized numberplates for your car or motorbike. They boast a large number of famous clients and short numberplates are often on sale for upwards of £20,000 (the plate ABC 4 is up for £30,000). While playing with their site I discovered a flaw that would allow anybody to purchase that ABC 4 plate for an arbitrary price. As always I waited until the issue was fixed on the site before publishing this post.

The order process

The order process is simple and painless. You search for a numberplate and browse ones available for purchase. Once you have found one that you like you select "Buy" and you are taken to a page where you can enter your personal details:

Within the order page there is a form element with some interesting names:

Surely the site would validate the input to ensure no negative values could be added?

Or not

And finally the payment page:

I would have certainly been able to complete the payment but whether or not the order would have succeeded depends on how automated regtransfers.co.uk is. I doubt it's completely automated so I expect someone along the line would have flagged this up and stopped the transfer, but who knows? This goes to show that while using a framework can protect you against common flaws like SQL injection or XSS business logic still needs to be tested and inputs still need to be validated.