Welcome to the second part in this series. This isn't covering features of OpenPyXL
directly. But since a lot of the data brought into Excel is from CSV files. I thought we should get this out of the way. OpenPyXL
doesn't load CSV files directly. What we can do is use the Python csv
module to pull in the csv file, iterate over the data and create a workbook out of it. We can do a reverse process to go from Excel into a CSV file.
It's possible to load the CSV files manually into Excel and save as XLSX file first, but since we are trying to automate our work... ๐
These are the packages we'll need for either direction we're using.
import csv
import openpyxl
CSV to Excel
wb
will be used to store a newly created OpenPyXL Workbook. ws
will point to the active sheet in the new workbook.
wb = openpyxl.Workbook()
ws = wb.active
In this first version, we're adding each csv row as a row in the worksheet. The advantage of this version is it's a bit more straight forward and only a single loop which will make this a bit quicker.
The delimiter=','
option on the csv.reader()
can be changed to whatever delimiter the csv file is using. The default is ,
, so in this case, it could have been left out.
with open('classics.csv') as f:
reader = csv.reader(f, delimiter=',')
for row in reader:
ws.append(row)
If we need a bit more control over the data going into each cell, we can loop over each row and place the values into each cell.
The outer for
loop goes over each line of the CSV file. Using enumerate
an index will be provided for each row. Most things in programming starts at zero, but a workbook starts at 1 for each index, row and column. We'll use the start=1
option so we don't need to use a +1
. ๐
with open('classics.csv') as f:
reader = csv.reader(f, delimiter=',')
for row_index, row in enumerate(reader, start=1):
for column_index, cell_value in enumerate(row, start=1):
ws.cell(row=row_index, column=column_index).value=cell_value
The inner for
loop will iterate row providing a cell value. Again, we're using enumerate
to have an index.
The ws.cell()
method will put the data from the CSV into each cell. We have the option here to modify the value before placing it in the cell.
At this point we have the CSV data in a OpenPyXL workbook. This workbook can be styled as needed before we finally save the workbook as an XLSX file.
wb.save('file.xlsx')
Excel to CSV
Keep in mind CSV doesn't support any formatting and only has a single sheet
. If the Excel file has multiple sheets, you'll need to decide whether to combine them into a single csv file, use only 1 sheet from the workbook or create multiple CSV files.
wb
will store the existing Excel workbook. ws
will be the sheet we'll be putting in the CSV file. We're selecting the sheet that is the default active sheet. This is the sheet that opens when Excel opens the file. Normally this is the first one but doesn't have to be.
wb = openpyxl.load_workbook(filename='data.xlsx')
ws = workbook.active
Using a List Comprehension, we're creating a list based on each row on the sheet. The csv_data
will be a list of the rows.
csv_data = [
list(value)
for value in ws.iter_rows(values_only=True)
]
ws.iter_rows(values_only=True)
provides an iterable that provides a row object. Passing it to list()
will create a list.
Using an open file, we'll use the csv.writer()
to load the csv data row by row.
with open('data.csv', 'w') as file:
writer = csv.writer(file, delimiter=',')
for line in csv_data:
writer.writerow(line)
Up Next
We'll look at loading an Excel file and do some basic things with a workbook.
Besides CSV and Excel files, are there other data sources you use when creating Excel reports? Let me know in the comments, I might do another post about pulling that in using Python.