Breaking out of secured Python environments

A week or so ago I was browsing /r/Python and I saw a link to a website called rise4fun.com, which is a Microsoft Research project that contains a lot of cool demos and tools that you can run in your browser. The demo I was linked to was a restricted Python shell that could be used to experiment with a "high performance theorem prover" called Z3. Python is a highly dynamic language and so it is pretty hard to secure correctly, and I found numerous ways around the restrictions they put in place which I have detailed in this post. Due to these issues the z3py section was removed shortly after I made contact so you can't see it for yourself, however the plain z3 section is still there so take a look here for reference (just imagine it took Python code as input).

The restrictions

The first thing I did was to explore what restrictions they had in place to prevent malicious activity. I found they had the following restrictions on code being executed:

  • Any use of the import statement
  • Use of any attribute prefixed with a double underscore (which rules out all Python special methods like __getattr__)
  • Use of any attribute name in a blacklist (open, getattr, setattr, locals, globals etc)

They implemented these restrictions by parsing the Python code into an AST representation and looking for any attribute access prefixed with a double underscore, any use of the import statement or any name in a blacklist.

Breaking out

In Python exception's have a defined hierarchy. The following code is a common way to catch all exceptions while executing some_func():

try:
   some_func()
except Exception:
   pass

This works in most cases as almost all exceptions have Exception as a parent somewhere. However there are a few (MemoryError for one) that inherit only from BaseException and not Exception, and so would not get caught by that code.

Raising a BaseException() escaped the try/except block they had in place and gave me a nice traceback with some the path to the executing script.

Enumerating attributes

There are a few ways to enumerate attributes in Python, you could use the __dict__ attribute on any class but this was obviously restricted. The other more common way is to use the dir() function, which was not restricted. I found that "print dir(x)" didn't work and returned no output, however "print exit(dir(x))" did (str was restricted, but exit() turns its parameters into a string):

Accessing restricted functions

By enumerating the attributes I found an interesting one called "ctx". This wasn't interesting by itself, but it had an attribute called "lib" that was a reference to the z3 module, which had some interesting sounding functions.

Writing files

The one that stood out in the z3 module were the logfile functions, Z3_open_log and Z3_append_log. This allowed me a simple way to write files, helped in part by the path I acquired using the BaseException method above:

And the resulting file:

Getting a reference to the sys module

The sys module in Python is a goldmine, if you can get a reference to that in a restricted environment then you will be able to get a reference to any imported module by using the sys.modules dictionary. After a while of getting nowhere I hit myself, I forgot about func_globals. Any Python function has an attribute called func_globals which according to the Python docs is a "reference to the dictionary that holds the function’s global variables — the global namespace of the module in which the function was defined". Using this we could easily get a reference to the sys module and therefore any imported module:

And then we could use the io module to read and write arbitrary files:

Bonus marks

None of the output was escaped, and while it's not too serious as there are no user accounts (as far as I know) and so nothing to steal it's still a bit funny:

Securing Python

Python is hard to secure. The best way is to execute it in a temporary environment (like a temporary docker instance) so that if someone did manage to escape they would not be able to wreak havoc on anything important. That being said the blacklist AST parsing method that rise4fun used was clever and worked to a degree, improvements could be made to make it more viable.

Inspecting .NET applications with ILSpy

Every once in a while I come across an application that is so comically insecure that I feel the urge to blog about it. The application in question is a .NET application to manage care homes and provide a Medical Administration Record for residents. Staff login to the app using a username and password associated with an organization and you use the application to view everything about a care home - the residents, their schedules, your rota etc. Everything is synchronized to a remote server via a collection of RPC methods, and the application even works while not connected to the internet and will push any modified data to the remote server when it can next connect.

The application is essentially a thick client - it fetches data from the remote database and displays it to the user in various ways, whilst also taking input from the user and submitting it to the server after validating it. That sounds good, until you look deeper. The application performed all its validation on the client, the remote server performed no validation at all - any user (even unauthenticated ones) could just request a complete list of patients including medical records and the server would send them, no questions asked. They violated one key security principle: never trust user input. They trusted that the thick client was the only way to communicate with the remote server (and thus all user input came form an authenticated source), which is a bit silly since the binary application is .NET (easy to decompile), not obfuscated in any way and used standard .NET serialization to send/receive objects from the server.

The analogy that can be drawn is one of a bank full of customers. The bank vault is your database, the teller is your server and the customers are people using your application. Normally customers in the bank are not malicious and only make valid requests: “can I transfer my money to this account” or “can I check my balance”. However I found that the teller (the backend service) answers requests indiscriminately, so if a malicious customer were to ask “can I check the balance of someone else's account” or “can I transfer all money from Mr Gate's account into mine” the teller would not check if the customer is allowed to make those requests before processing them.

Hacking the application

I can't really write about the backend, but I can write about one interesting issue we found. This snippet says it all. Remember, this runs on the client:

internal static User UserLogin(String username, String password)
{       
    // Notice how it compares the password locally after fetching the User object
    var user = DataPortal.Fetch<User>(username);
    if (user.PasswordHash != GetPasswordHash(username,password)) return null;
    return user;
}

So, lets hack it. ILSpy is a fantastic tool for debugging .NET applications. One of its best features is you can set breakpoints in arbitrary assemblies (you have to compile it from source in debug mode to enable that it seems). Because the app fetches the user and then compares it locally we can just set a breakpoint after it fetches the object but before it performs any checks. First you have to execute the assembly through ILSpy:

Once you have selected the executable ILSpy will execute it and decompile the sources. This allows you to navigate through the source (which may differ from the real source in some ways) and set breakpoints to be triggered. You can use the sidebar on the left to navigate the various namespaces in the assembly and view the classes contained within. Below I have located the actual code segment where the user is fetched and I have set a breakpoint on the statement after it.

After I've set the breakpoint I simply need to attempt to login as the "admin" user (with any password) and I can view all of the admin users attributes (public and private), including his password hash, by simply hovering over the reference in the source code.

Isn't ILSpy awesome?

Automatically inline Python function calls

Edit: Code is here on GitHub

Calling functions in Python can be expensive. Consider this example: there are two statements that are being timed, the first one calls a function that returns an integer while the second one calls a function that returns the result of a second function call which returns an integer.

tom@toms ~$ python -m timeit -n 10000000  -s "def get_n(): return 1" "get_n()"
10000000 loops, best of 3: 0.145 usec per loop
tom@toms ~$ time python -m timeit -n 10000000 -s "get_n = lambda: 1; get_r_n = lambda: get_n()" "get_r_n()"
10000000 loops, best of 3: 0.335 usec per loop

The additional function call doubled the program execution time, despite not effecting the output of the function in any way. This got me thinking, how hard would it be to create a Python module that would inline functions, removing the calling overhead from certain functions you specify?

As it turns out, not that hard. Note: This is simply an experiment to see what's possible, don't even think about using this in real Python code (there are some serious limitations explained at the end). Check this out:

from inliner import inline

@inline
def add_stuff(x, y):
    return x + y

def call_func_args(num):
    return add_stuff(1, num)

import dis
dis.dis(call_func_args)
# Prints:
# 0 LOAD_CONST               1 (1)
# 3 LOAD_FAST                0 (num)
# 6 BINARY_ADD          
# 7 RETURN_VALUE        

The dis function prints out the bytecode operations for a Python function, which shows that the call_func_args function has been modified so that the add_stuff() call never takes place and instead the body of the add_stuff function has been inlined inside the call_func_args function. I've put the code on GitHub, have a look if you like. Below I will explain how it works, for those interested.

Diving in: Import hooks and the AST module

Python is an interpreted language, when you run a Python program the source code is parsed into an Abstract Syntax Tree which is then 'compiled' into bytecode. We need a way of modifying the AST of an imported module before it gets compiled, and as luck would have it Python provides powerful hooks into the import mechanism that allow you to write importers that grab code from the internet or restrict packages from being imported. Getting our claws into the import mechanism is as simple as this:

import sys, imp

class Loader(object):
    def __init__(self, module):
        self.module = module

    def load_module(self, fullname):
        return self.module

class Importer(object):
    def find_module(self, fullname, path):
        file, pathname, description = imp.find_module(
            fullname.split(".")[-1], path)
        module_contents = file.read()
        # We can now mess around with the module_contents.
        # and produce a module object
        return Loader(make_module(module_contents))

sys.meta_path.append(Importer())

Now whenever anything is imported our find_module() method will be called. This should return an object with a load_module() function, which returns the final module.

Modifying the AST

Python provides an AST module to modify Python AST trees. So inside our find_module function we can get the source code of the module we are importing, parse it into an AST representation and then modify it before compiling it. You can see this in action here.

First we need to find all functions that are wrapped by our inline decorator, which is pretty simple to do. The AST module provides a NodeVisitor and a NodeTransformer class you can subclass. For each different type of AST node a visit_NAME method will be called, which you can then choose to modify or pass along untouched. The InlineMethodLocator runs through all the function definition's in a tree and stores any that are wrapped by our inline decorator:

class InlineMethodLocator(ast.NodeVisitor):
    def __init__(self):
        self.functions = {}

    def visit_FunctionDef(self, node):
        if any(filter(lambda d: d.id == "inline", node.decorator_list)):
            func_name = utils.getFunctionName(node)
            self.functions[func_name] = node

The next step after we have identified the functions we want to inline is to find where they are called, and then inline them. To do this we need to look for all Call nodes in our modules AST tree:

class FunctionInliner(ast.NodeTransformer):
    def __init__(self, functions_to_inline):
        self.inline_funcs = functions_to_inline

    def visit_Call(self, node):
        func = node.func
        func_name = utils.getFunctionName(func)
        if func_name in self.inline_funcs:
            func_to_inline = self.inline_funcs[func_name]
            transformer = transformers.getFunctionHandler(func_to_inline)
            if transformer is not None:
                node = transformer.inline(node, func_to_inline)

        return node

This visits all call objects and if we are calling a function we want to inline then we go grab a transformer object which will be responsible for the actual inlining. I've only written one transformer so far that works on simple functions (functions with 1 statement), but more can be added fairly easily. The simple function transformer simply returns the contents of the function and maps the functions values to the values of the calling function:

class SimpleFunctionHandler(BaseFunctionHandler):
    def inline(self, node, func_to_inline):
        # Its a simple function we have here. That means it is one statement and we can simply replace the
        # call with the inlined functions body
        body = func_to_inline.body[0]
        if isinstance(body, ast.Return):
            body = body.value

        return self.replace_params_with_objects(body, func_to_inline, node)

Limitations

There are some serious limitations with this code:

  1. Inlined functions must have a unique name: The AST provides us with no type information (as Python is dynamically typed), only the name of the function we are calling. That means without writing code that attempts to deduce the type of a class instance (no mean feat) then each function call must have a unique name.
  2. Only inlines functions in the same module: To keep things simple only calls in the same module are inlined.
  3. Inlined class functions can't reference any double underscore attributes: Accessing self.__attr is about as 'private' as you can get in Python. The attribute lookup is prefixed with the class name, which we can't easily detect while inlining.
  4. Everything will break: Python is very dynamic, you may wish to replace functions at runtime. Obviously if the functions have been inlined then this won't have any effect.

SSDs are awesome, buy one.

I recently brought a Samsung 840 Series Pro 256GB 2.5 inch SATA Solid State Drive and its easily the best PC hardware purchase I have ever made. Before I purchased it I was thinking about replacing my laptop as it was getting pretty sluggish, despite having decent specs even 2 years after I purchased it. After I installed the SSD alongside one of the original hard disks the laptop became blazing fast - it boots to the Windows 8 home screen in under 4 seconds which is faster than my Chromebook (a product designed from the ground up to boot as fast as possible).

Benchmarks

The drive came with some software from Samsung to do various things with the drive, including benchmark it. At first I thought the software was just another bit of bloatware but it's actually quite useful and well designed. I used it to benchmark the SSD (my OS drive) and the other 500gb spinning drive:

My old spinning disk barely managed to scrape 250 random IO operations a second, whereas the SSD blasted ahead with nearly 65,000. That's 260x more random reads! The net result of this is everything is pretty much instant: the boot, visual studio, Chrome etc. It's like having a new laptop again.

The upsides

Price - while £180 might seem like a lot for a 256GB disk its well worth it if only for the speed. SSD prices will continue to drop, you can pick a SanDisk 256GB disk for only £124. Hopefully sub £100 256GB SSD's will be out soon which would really drive adoption.

Speed - If we multiply the disk read times by a billion reading 1MB sequentially from a spinning disk would take 7.8 months (up to a whole year if you include a disk seek). Reading the same from a SSD would take 11.6 days.

Power - SSD's don't have any spinning parts such as a motor and so draw much less power. This can mean a big boost in battery life for Laptops such as mine (up to half an hour extra)

The downsides

Capacity - 256GB doesn't seem like much, and it isn't if you have a lot of movies, music and pictures. I would recommend using this SSD along with a higher-capacity spinning drive to store all your large stuff, leaving the SSD for the operating system and applications.

The software

The software deserves its own section. It allows you to benchmark your disk (and any other attached disk), optimize your OS for several different use cases (performance, capacity or resilience) and upgrade the firmware. Its genuinely quite cool and I wish all pre-bundled software was as good as this. The screenshot below is a random one I found on the internet:

Displaying a processes output on a web page with Websockets and Python

A few days ago a colleague of mine asked me how you would pipe the standard output of a process into a browser. I hacked around for a few hours and came up with a websockets based solution (using Twisted and Autobahn.ws) that you can see below (Your browser needs to support WebSockets, sorry IE9 and lower).

This is a live instant-updating tail of this sites web logs (tail -F access_log) with IP addresses omitted:

Edit: Offline for now :(


The code is very simple and can be found below or here on Github. It works like so:

When the file is executed by Python a WebSocketProcessOutputterThingFactory is created, which in turn creates a ProcessProtocol. The ProcessProtocol runs a command of your choosing (specified via the command line) and buffers the last 10 lines in memory. While this is chugging along a websocket client can connect on port 9000 and is added to a list of connected clients, which is managed by the WebSocketProcessOutputterThingFactory. Whenever the ProcessProtocol receives output it passes it to the WebSocketProcessOutputterThingFactory which then blasts that message to all the connected clients via their websocket connection. A bit of JavaScript can then display the data any way it likes.

All of this happens inside Twisted's event loop, which is pretty cool because its event-driven nature allows you to mix and match protocols (in this case a ProcessProtocol and Websockets), you could send the output over any protocol (IRC, a HTTP stream, whatever) if you wanted.

Overall I'm pretty impressed with Autobahn, even though the docs are a bit crap.

How to use:

Grab the code from the Github repo. You need to install Twisted and AutoBahn, and if you are running this on Windows you also require PyWin32. Once those are all installed you can run the script like so:

python runner.py [shell command to run]

e.g:

python runner.py tail -F /var/log/nginx/access_log

or:

python runner.py /bin/sh -c "tail -F /var/log/nginx/access.log -n 150 | grep -v static --line-buffered | awk '{\$1=\"\"; print}'"

This should start a websocket server on port 9000, and the supplied index.html should connect to this and display the output. The .html file attempts to connect to localhost:9000, so you may need to change this if your .py file is running somewhere else or on a different port.

The code:

from twisted.internet import reactor, protocol
from autobahn.websocket import WebSocketServerFactory, \
                               WebSocketServerProtocol, \
                               listenWS
from twisted.python.log import startLogging, msg
import sys
startLogging(sys.stdout)

# Examples:
# runner.py /bin/sh -c "tail -f /var/log/nginx/access.log | grep -v secret_admin_page" --line-buffered | awk '{\$1=\"\"; print}'"
# runner.py tail tail -F /var/log/nginx/access.log

COMMAND_NAME = sys.argv[1]
COMMAND_ARGS = sys.argv[1:]
LOCAL_ONLY = False
DEBUG = True


class ProcessProtocol(protocol.ProcessProtocol):
    """ I handle a child process launched via reactor.spawnProcess.
    I just buffer the output into a list and call WebSocketProcessOutputterThingFactory.broadcast when
    any new output is read
    """
    def __init__(self, websocket_factory):
        self.ws = websocket_factory
        self.buffer = []

    def outReceived(self, message):
        self.ws.broadcast(message)
        self.buffer.append(message)
        self.buffer = self.buffer[-10:] # Last 10 messages please

    def errReceived(self, data):
        print "Error: %s" % data


# http://autobahn.ws/python
class WebSocketProcessOutputterThing(WebSocketServerProtocol):
    """ I handle a single connected client. We don't need to do much here, simply call the register and un-register
    functions when needed.
    """
    def onOpen(self):
        self.factory.register(self)
        for line in self.factory.process.buffer:
            self.sendMessage(line)

    def connectionLost(self, reason):
        WebSocketServerProtocol.connectionLost(self, reason)
        #super(WebSocketProcessOutputterThing, self).connectionLost(self, reason)
        self.factory.unregister(self)


class WebSocketProcessOutputterThingFactory(WebSocketServerFactory):
    """ I maintain a list of connected clients and provide a method for pushing a single message to all of them.
    """
    protocol = WebSocketProcessOutputterThing

    def __init__(self, *args, **kwargs):
        WebSocketServerFactory.__init__(self, *args, **kwargs)
        #super(WebSocketProcessOutputterThingFactory, self).__init__(self, *args, **kwargs)
        self.clients = []
        self.process = ProcessProtocol(self)
        reactor.spawnProcess(self.process,COMMAND_NAME, COMMAND_ARGS, {}, usePTY=True)

    def register(self, client):
        msg("Registered client %s" % client)
        if not client in self.clients:
            self.clients.append(client)

    def unregister(self, client):
        msg("Unregistered client %s" % client)
        if client in self.clients:
            self.clients.remove(client)

    def broadcast(self, message):
        for client in self.clients:
            client.sendMessage(message)


if __name__ == "__main__":
    print "Running process %s with args %s" % (COMMAND_NAME, COMMAND_ARGS)
    factory = WebSocketProcessOutputterThingFactory("ws://%s:9000" % ("localhost" if LOCAL_ONLY else "0.0.0.0"), debug=False)
    listenWS(factory)
    reactor.run()