OpenPyXL - Using Python to Automate Spreadsheets

OpenPyXL - Using Python to Automate Spreadsheets

Part 1

ยท

4 min read

Python is a general-purpose language that is capable of working on many different project types. People create web sites, games, data science, utilities and so much more.

A common application for Python is to process spreadsheets and CSV files. Many businesses will have loads of Excel spreadsheets that need to be processed and formatted into reports. Reports that have to be redone monthly, weekly or even daily.

There are a few options to automate processes with Excel. The biggest advantage that Python has is it's a fairly easy language to learn and reads almost like English. Excel does not need to be installed on the computer running an OpenPyXL script.

This series will cover using OpenPyXL to handle the heavy lifting of working with Excel files. This is a Python library that makes creating, editing, styling and updating Excel workbooks easy.

Installation

Installing openpyxl can be done with pip with the following command.

pip install openpyxl

Optionally, you may need to install the pillow library to add images to the spreadsheets.

pip install pillow

Importing OpenPyXL

The library is imported with a straightforward import statement. For this series I'll be using this method to display which methods are coming from OpenPyXL.

import openpyxl

To keep from typing long commands, you can also declare the parts you need.

from openpyxl import Workbook

Parts of a Spreadsheet

If you've worked with spreadsheets, you are most likely already familiar with these terms.

NameDescription
Spreadsheet, WorkbookThe file that is being worked on or is created
Worksheet, sheet, tabThere can be 1 or more inside a workbook
CellA location for data, identified by the Row and Column the cell is at, such as A1
RowHorizontal line of cells, identified by uppercase letters such as A
ColumnVertical line of cells, identified by a number such as 1

Creating a Workbook

The Workbook() method will create a brand-new workbook in memory.

wb = openpyxl.Workbook()

We need to select which sheet we want to work with. By default, the first sheet is set to active. So, let's grab that one to use.

ws = wb.active

Please note that for OpenPyXL a new workbook only contains a single sheet, unlike Excel that starts with 3.

Assigning data

Data can be assigned to a specific cell by index the row and column label. For the cell the resides at row 1 and column A it would be A1.

ws['A1'] = 'Billing Cycle'

The append() method of the sheet can be used to add rows to the sheet.

ws.append(['Department', 'Jan', 'Feb', 'Mar', 'Apr'])
ws.append(['Electronics',100, 150, 434, 23])
ws.append(['Auto', 200, 12, 321, 34])

Adding a second sheet

Using the create_sheet() method of the workbook we can create a new sheet on wb. The new sheet will be adding at index 1. As with most things computers, indexes starts at 0 so this will be in the 2nd position. ๐Ÿ˜€

wb.create_sheet(index=1, title="Lakes")

To grab this new sheet, we can select the sheet by the title.

lake_sheet = wb['Lakes']

In the Data Sample section lower in this post is a data set that is a list of lists. It contains the names, area and lengths of some of the largest lakes in the world. We can loop through the list and append the inner lists into the sheet.

for row in data_sample:
  lake_sheet.append(row)

Saving the workbook

Before the script ends, we need to save the workbook so not to lose the changes.

wb.save("first_example.xlsx")

Up Next

We're off to a great start. In this series we'll be covering styling the spreadsheets, adding Excel charts, and adding formulas directly into the cells.

Any Excel 2010 xlsx/xlsm/xltx/xltm can all be used. Any Excel files prior (xls) are not able to be loaded. Excel is copyrighted by Microsoft.

Data Sample

# This is the data to load onto the Lake sheet
data_sample = [
  ['Name', 'Area km^2', 'Length km'], 
  ['Caspian Sea', 371000, 1199], 
  ['Superior', 82100, 616],
  ['Victoria', 68870, 322],
  ['Huron', 59600, 332],
  ['Michigan', 58000, 494],
  ['Tanganyika', 32600, 676],
  ['Baikal', 31500, 676],
  ['Great Bear Lake', 31000, 373],
  ['Malawi', 29500, 579],
  ['Great Slave Lake', 27000, 480]
]

Resouces

ReadTheDocs : Openpyxl

pypi : Openpyxl

RealPython : A Guide to Excel Spreadsheets in Python With openpyxl

Did you find this article valuable?

Support Russ Eby by becoming a sponsor. Any amount is appreciated!