How to Read and Write Excel Files with the Python openpyxl Package

14:27 read·1879 words·2024-6-28 published

To run the examples in this article, you need to install the Python package openpyxl, which you can check the section Install the openpyxl Python package for more information.

Excel File Formats Supported by the Python openpyxl package

Regardless of which function or class in the package is used, the Python openpyxl package can only read and write the following Microsoft Excel file formats, .xlsx, .xlsm (Excel files with macros enabled), .xltx (Excel template files), .xltm (Excel template files with macros enabled).

For the old Excel file format .xls, you can try to manipulate it using the Python packages xlrd and xlwt, or convert .xls to an Excel file in .xlsx format.

Read Excel Files with the Python openpyxl Package

You can read data from an Excel file using the load_workbook function in the Python openpyxl package or its alias open (not Python’s built-in function open), which returns a workbook object Workbook that manipulates Excel data.

load_workbook(filename, read_only=False, keep_vba=KEEP_VBA, data_only=False, keep_links=True, rich_text=False)

filename parameter: The filename parameter is the path (absolute or relative) to the Excel file to be read, or a file-like object that is associated with the Excel file and is in binary mode. For example, a file-like object created by calling the Python built-in function open (not the alias of load_workbook) and using rb mode.
read_only parameter: The read_only parameter defaults to False and is used to indicate whether the openpyxl workbook object Workbook returned by the load_workbook function is read-only. If it is set to True, then you will not be able to save changes to Excel data through the Workbook object, which may improve performance a bit, as logic associated with writing and some complex content (e.g., hyperlinks in Excel cells) will be ignored.
keep_vba parameter: In a non-test environment, the default value of the keep_vba parameter is False, which means that the contents of the Excel file relating to VBA (Visual Basic for Applications) will not be read. If it is set to True, then the contents of the Excel file relating to VBA will be read into the Workbook object by openpyxl, which, of course, does not mean that you can execute the code in VBA or use the functionality implemented by VBA through the Python openpyxl package.
data_only parameter: The data_only parameter (default value False) is used to indicate whether to read the results of formulas in Excel cells, or to read formulas in Excel cells when the Excel file that openpyxl is reading contains cells with formulas. If set to True, then openpyxl will read the results of formulas, which means that the related Excel cells in the Workbook object no longer have a formula.
keep_links parameter: The keep_links parameter, which defaults to True, is used to indicate whether or not to read a cache of Excel externally linked data, such as the value of a cell in a worksheet of another Excel file, when the Python openpyxl package reads an Excel file. If set to False, then the cache of externally linked data will not be read into the Workbook object, which will be effective when the cache is too large.
rich_text parameter: The rich_text parameter is used to indicate whether rich text formatting (RFT) in cells is preserved when the Python openpyxl package reads an Excel file, and its default value is False (rich text formatting is not preserved), which means that the rich text formatting of the related Excel cells in the Workbook object will be lost. When the rich_text parameter is True, saving the Workbook object may fail.

Next, we have prepared the Excel file with the name Hello.xlsx and the Python script file read.py that reads the Excel file, and they are located in the same directory.

read.py

# Please jump the command line to the directory where Hello.xlsx is located and then run this script file
# Import function load_workbook
from openpyxl import load_workbook


# Read the Excel file
workbook = load_workbook('Hello.xlsx')


# Get cells A1, B1, C1, B4, C4 in Worksheet 1.1 and display them
worksheet = workbook['Class 1.1']
name = worksheet['A1'].value
age = worksheet['B1'].value
score = worksheet['C1'].value
print(f'The first student {name} {age} {score}')
avg_age = worksheet['B4'].value
avg_score = worksheet['C4'].value
print(f'Average formula {avg_age} {avg_score}')

The first student Jack 13 100
Average formula =AVERAGE(B1:B3) =AVERAGE(C1:C3)

The Python openpyxl package uses the read attribute to determine whether the filename parameter is a file-like object

During the process of reading an Excel file, openpyxl will check whether the load_workbook function’s parameter filename has the attribute read, and if it does, then the filename parameter will be treated as a Python file-like object, and the read attribute will be called as a method to get data in binary format from the Excel file.

In the following code, we open the Excel file Hello.xlsx using the open function and use the returned object as an argument to the load_workbook function.

data_only.py

# Please jump the command line to the directory where Hello.xlsx is located and then run this script file
# Import function load_workbook
from openpyxl import load_workbook


# Use the open function to open the Excel file
xlsx = open('Hello.xlsx', 'rb')
workbook = load_workbook(xlsx, data_only=True)


# Read and display the results of the formulas in cells B4 and C4
worksheet = workbook['Class 1.1']
avg_age = worksheet['B4'].value
avg_score = worksheet['C4'].value
print(f'Average {avg_age} {avg_score}')

Average 13 96.3333333333333

When the read_only parameter is set to True, the Workbook and its related objects can still be modified in some ways

Although setting the read_only parameter of the load_workbook function to True causes an exception to be thrown when executing the save method of the Excel workbook object Workbook or writing to an Excel cell, not all modifications are infeasible (for example, the title of a worksheet can be changed), except that the changes cannot be saved.

Next, let’s open the Excel file Hello.xlsx in read-only mode and then change the title of the worksheet.

read_only.py

# Please jump the command line to the directory where Hello.xlsx is located and then run this script file
# Import package openpyxl
import openpyxl


# Open an Excel file in read-only mode and modify the worksheet title
workbook = openpyxl.open('Hello.xlsx', True)
worksheet = workbook['Class 1.1']
worksheet.title = 'Class 1.2'


# ERROR Read-only Workbook objects can't modify cell values
worksheet['A1'].value = 'A good man'

AttributeError: Cell is read only

A rich_text parameter of False does not mean that all styles will be lost for the related Excel cells in the Workbook object

The fact that the load_workbook function has a rich_text parameter of False does not mean that all styles will be lost for the cells in the openpyxl Workbook object, because some styles may not be rich text formatting and are not specific to the text blocks in the Excel cell, e.g., the text color set for the entire cell.

Write (Save) Excel Files with the Python openpyxl Package

You can save a Workbook object as a Microsoft Excel file using the save_workbook function of the Python module openpyxl.writer.excel, or the save method of the Workbook object.

save_workbook(workbook, filename) save(filename)

workbook parameter: The workbook parameter is the Workbook object to be saved.
filename parameter: The filename parameter is the absolute or relative path to the Excel file, if the path points to a file that already exists, then it will be overwritten.

Next, we try to save the Workbook object as file Overwrite.xlsx again after modifying the Excel cell A1 and the original Overwrite.xlsx will be overwritten.

save.py

# Overwrite.xlsx will be saved to the current working directory of the command line
# Import package openpyxl
import openpyxl


# Create a workbook and save it twice
workbook = openpyxl.Workbook()
workbook.save('Overwrite.xlsx')
workbook['Sheet']['A1'].value = 'Hello!'
workbook.save('Overwrite.xlsx')

A Workbook object with a write_only property of True can be saved only once

When you create a Workbook object directly from the Python openpyxl package and set its constructor’s write_only parameter to True, the created Workbook object will be write-only (the write_only property is True), and for write-only Workbook objects, you can only call their save method or use the save_workbook function to save once; multiple saves will result in an exception.

In the following example, we try to save the write-only Workbook object twice.

write_only.py

# Import function save_workbook and class Workbook
from openpyxl.writer.excel import save_workbook
from openpyxl import Workbook


# Create the Workbook object and save it
workbook = Workbook(True)
workbook.save('New.xlsx')
# ERROR Can only be saved once
save_workbook(workbook, 'New.xlsx')

Workbook objects saved as Excel files need to have at least one visible worksheet

Whether using the save_workbook function of the openpyxl.writer.excel module or the save method of the Workbook object, the Python openpyxl package requires that the Excel workbook object has at least one visible worksheet, i.e. at least one Worksheet object whose sheet_state property is 'visible', otherwise saving as an Excel file will not work and the exception IndexError: At least one sheet must be visible will be thrown.

In the following code, the Excel workbook object will have two worksheets, and since they are both invisible, the save will fail.

no_sheet.py

# Import Workbook class
from openpyxl import Workbook


# Create a Workbook object and set the worksheets in it to be hidden
workbook = Workbook()
workbook['Sheet'].sheet_state = 'hidden'
workbook.create_sheet().sheet_state = 'hidden'
# ERROR Unable to save Workbook objects without visible worksheets
workbook.save('nosheet.xlsx')

IndexError: At least one sheet must be visible

Close an Excel File Opened by the Python openpyxl Package

If you get a read-only Workbook object by setting the read_only parameter to True when using the load_workbook function of the Python openpyxl package, then calling the close method of the Workbook object can close the ZipFile object related to the Excel workbook, after that, reading the cells in the Excel worksheet will not be feasible, but some information related to the worksheet itself can still be accessed, such as the name (title) of the worksheet.

The close method has no significant effect on a Workbook object with a write_only property of True

Although in the official description the close method affects write-only Workbook objects, this effect is not obvious (it seems to be non-existent), and you can write a row of data to a Workbook object that has called the close method and save it via the save method.

There is no problem writing data before or after calling the close method on the Workbook object, and the save method can be called to save the Workbook object as an Excel file.

close.py

# Please jump the command line to the directory where Hello.xlsx is located and then run this script file
from openpyxl import Workbook, load_workbook


# Create a write-only workbook
w_workbook = Workbook(True)
w_workbook.create_sheet()
w_workbook['Sheet'].append(['Hello', 'World'])
# After calling the close method, write another row of data and save it
w_workbook.close()
w_workbook['Sheet'].append(['Hello', 'World'])
w_workbook.save('Close.xlsx')


# Create a read-only workbook
r_workbook = load_workbook('Hello.xlsx', True)
print(r_workbook['Class 1.1']['A1'].value)
# After the close method is called, cell A1 is read
r_workbook.close()
# ERROR Unable to read cell A1
print(r_workbook['Class 1.1']['A1'])

Jack
…
ValueError: Attempt to use ZIP archive that was already closed

Source Code

src/en/openpyxl/excel_files·codebeatme/office-programming·GitHub