Tom's corner of the internet


February 18, 2013 - Under:

You can find the code here on github and the package here on PyPi

I have written and continue to maintain a reporting system for a group of pentesters. During/after the tests the results and details are inputted into a web application using a WYSIWYG editor called Redactor (which is pretty awesome!) and the system generates a word document based upon this input which is then sent to the client. There doesn’t seem to be a reliable way of inputting HTML into a Word document via COM (apart from simulating pasting HTML, which is too hacky and offers too little control) so I ended up writing this little library to do it for me, and I think it could be useful to someone else.

HtmlToWord is a Python library that takes HTML input (like that outputted from a WYSIWYG editor) and converts it to a stream of instructions that will render the HTML onto a Word document. It supports most common HTML tags (full list here) but doesn’t support any form of line styles (yet?).


parser = HtmlToWord.Parser()
Html = '''

This is a title

I go below the image as a caption

This is some text in a paragraph

  • Boo! I am a list
''' parser.ParseAndRender(Html, word, document.ActiveWindow.Selection)

This code will create a new Word document and fill it like so:


Its pretty neat I think - I can’t be the only one with this kind of issue so I hope this library helps someone.

Gravatar for

Written by Tom Forbes who lives and works in London building useful things with Python and Django. I usually blog about security, my projects and random experiments