Automating Microsoft Word Documents With Python-DOCX

docx.png

Automating simple repetitive tasks is just awesome. It saves a lot of time and time is one of the most precious things we have. We all perform different kinds of repetitive work either in professional or personal lives. Thanks to technology and programming tools, we are able to let the computers do the heavy lifting. I have written in the past about automating stocks and crypto market data, extracting data from pdf files, building excel spreadsheets, etc. Until recently, I haven't tried and didn't have a need for automating Microsoft Word documents. I was sure there are easy to use solutions available for top programming languages, especially for python. When I was received a request to automate MS Word forms, I didn't have to think twice and was glad for an opportunity to experiment with automating documents.

Python-docx is one of the python libraries that allows us to create and edit Microsoft Word files. It's documentation let's us get started and experiment with docx super quick. However, it lacks more detailed explanation of solving more complex problems. Python community is big, and there is plenty of resources available that help finding the right solutions.

The script/app I was writing had a very simple goal. To create a template of an existing document, let the user enter some of the data within an app and create a final document with proper naming. Since most of the data entered would be either in unchanging list, and options were limited, and only some of the items would have to change on daily basis, it did make sense to automate this process. This would save hours and decrease the time spent to seconds. It is interesting how many companies and organizations have so much bureaucracy involved and don't offer more efficient solutions to manage such processes.

To get started with python-docx is super simple. The documentation provides the following template code, which was copying and used in many tutorials on library. I too will share the code, just to demonstrate how easy it is to automate Microsoft Word documents.


from docx import Document
from docx.shared import Inches

document = Document()

document.add_heading('Document Title', 0)

p = document.add_paragraph('A plain paragraph having some ')
p.add_run('bold').bold = True
p.add_run(' and some ')
p.add_run('italic.').italic = True

document.add_heading('Heading, level 1', level=1)
document.add_paragraph('Intense quote', style='Intense Quote')

document.add_paragraph(
    'first item in unordered list', style='List Bullet'
)
document.add_paragraph(
    'first item in ordered list', style='List Number'
)

document.add_picture('monty-truth.png', width=Inches(1.25))

records = (
    (3, '101', 'Spam'),
    (7, '422', 'Eggs'),
    (4, '631', 'Spam, spam, eggs, and spam')
)

table = document.add_table(rows=1, cols=3)
hdr_cells = table.rows[0].cells
hdr_cells[0].text = 'Qty'
hdr_cells[1].text = 'Id'
hdr_cells[2].text = 'Desc'
for qty, id, desc in records:
    row_cells = table.add_row().cells
    row_cells[0].text = str(qty)
    row_cells[1].text = id
    row_cells[2].text = desc

document.add_page_break()

document.save('demo.docx')

The code self explanatory and once the document is created, by viewing the document we can see which line of code is create what paragraphs or parts of the document. I normally prefer to share my own code in posts like this. However, since the script I created had to do with a specific task and had to be run as app, it would be able to show how easy it is to use docx. The sample code above creates a documents and starts adding part of the document like the heading, paragraph, parts of the paragraph, unordered and ordered lists, tables, images, and applying styles.

When it comes to styles, fonts, colors, and layouts docx has multiple ways of achieving them. It is not clear right away what the standard or best practices are. In my situation, I had to experiment with different solutions to figure out a better way. While some actions are self explanatory and answers are available in the documentation, more specific situations are not easy to find solutions for. For example, in my document I had to create borders around some paragraphs. I wasn't able to find a solution in the documentation. I found the solution elsewhere, where someone who had a similar issues and shared their solution. But it didn't involve using simple methods and properties. Rather it involved lower level usage of document formatting. That part perhaps would require more studying and experimenting to take advantage of. But it also shows how powers this tool is, and very complex document task solutions can be found with enough time and effort.

I prefer integrating python scripts with Streamlit. This way scripts can be turned into apps and can easily be used by non-programmers. Streamlit apps are great to share automating solutions with teams, colleagues, friends, and clients. It gets the job done and doesn't require much of web development skills.

Have you used python-docx or other automating tools? Feel free to share your thoughts, experiences, and tools in the comments.

H2
H3
H4
3 columns
2 columns
1 column
17 Comments
Ecency