I have written and continue to maintain a reporting system for a group of pentesters. During/after the tests the results and details are inputted into a web application using a WYSIWYG editor called Redactor (which is pretty awesome!) and the system generates a word document based upon this input which is then sent to the client. There doesn’t seem to be a reliable way of inputting HTML into a Word document via COM (apart from simulating pasting HTML, which is too hacky and offers too little control) so I ended up writing this little library to do it for me, and I think it could be useful to someone else.
HtmlToWord is a Python library that takes HTML input (like that outputted from a WYSIWYG editor) and converts it to a stream of instructions that will render the HTML onto a Word document. It supports most common HTML tags (full list here) but doesn’t support any form of line styles (yet?).
parser = HtmlToWord.Parser() Html = '''
This is a title
This is some text in a paragraph
''' parser.ParseAndRender(Html, word, document.ActiveWindow.Selection)
- Boo! I am a list
This code will create a new Word document and fill it like so:
Its pretty neat I think - I can’t be the only one with this kind of issue so I hope this library helps someone.