How to Read and Write Excel Files with the Python openpyxl Package
To run the examples in this article, you need to install the Python package openpyxl
, which you can check the section Install the openpyxl Python package for more information.
Excel File Formats Supported by the Python openpyxl package
Regardless of which function or class in the package is used, the Python openpyxl
package can only read and write the following Microsoft Excel file formats, .xlsx
, .xlsm
(Excel files with macros enabled), .xltx
(Excel template files), .xltm
(Excel template files with macros enabled).
For the old Excel file format .xls
, you can try to manipulate it using the Python packages xlrd
and xlwt
, or convert .xls
to an Excel file in .xlsx
format.
Read Excel Files with the Python openpyxl Package
You can read data from an Excel file using the load_workbook
function in the Python openpyxl
package or its alias open
(not Python’s built-in function open
), which returns a workbook object Workbook
that manipulates Excel data.
load_workbook(filename, read_only=False, keep_vba=KEEP_VBA, data_only=False, keep_links=True, rich_text=False)
- filename parameter
The
filename
parameter is the path (absolute or relative) to the Excel file to be read, or afile-like
object that is associated with the Excel file and is in binary mode. For example, afile-like
object created by calling the Python built-in functionopen
(not the alias ofload_workbook
) and usingrb
mode.- read_only parameter
The
read_only
parameter defaults toFalse
and is used to indicate whether theopenpyxl
workbook objectWorkbook
returned by theload_workbook
function is read-only. If it is set toTrue
, then you will not be able to save changes to Excel data through theWorkbook
object, which may improve performance a bit, as logic associated with writing and some complex content (e.g., hyperlinks in Excel cells) will be ignored.- keep_vba parameter
In a non-test environment, the default value of the
keep_vba
parameter isFalse
, which means that the contents of the Excel file relating to VBA (Visual Basic for Applications) will not be read. If it is set toTrue
, then the contents of the Excel file relating to VBA will be read into theWorkbook
object byopenpyxl
, which, of course, does not mean that you can execute the code in VBA or use the functionality implemented by VBA through the Pythonopenpyxl
package.- data_only parameter
The
data_only
parameter (default valueFalse
) is used to indicate whether to read the results of formulas in Excel cells, or to read formulas in Excel cells when the Excel file thatopenpyxl
is reading contains cells with formulas. If set toTrue
, thenopenpyxl
will read the results of formulas, which means that the related Excel cells in theWorkbook
object no longer have a formula.- keep_links parameter
The
keep_links
parameter, which defaults toTrue
, is used to indicate whether or not to read a cache of Excel externally linked data, such as the value of a cell in a worksheet of another Excel file, when the Pythonopenpyxl
package reads an Excel file. If set toFalse
, then the cache of externally linked data will not be read into theWorkbook
object, which will be effective when the cache is too large.- rich_text parameter
The
rich_text
parameter is used to indicate whether rich text formatting (RFT) in cells is preserved when the Pythonopenpyxl
package reads an Excel file, and its default value isFalse
(rich text formatting is not preserved), which means that the rich text formatting of the related Excel cells in theWorkbook
object will be lost. When therich_text
parameter isTrue
, saving theWorkbook
object may fail.
Next, we have prepared the Excel file with the name Hello.xlsx
and the Python script file read.py
that reads the Excel file, and they are located in the same directory.
# Please jump the command line to the directory where Hello.xlsx is located and then run this script file
# Import function load_workbook
from openpyxl import load_workbook
# Read the Excel file
workbook = load_workbook('Hello.xlsx')
# Get cells A1, B1, C1, B4, C4 in Worksheet 1.1 and display them
worksheet = workbook['Class 1.1']
name = worksheet['A1'].value
age = worksheet['B1'].value
score = worksheet['C1'].value
print(f'The first student {name} {age} {score}')
avg_age = worksheet['B4'].value
avg_score = worksheet['C4'].value
print(f'Average formula {avg_age} {avg_score}')
The first student Jack 13 100
Average formula =AVERAGE(B1:B3) =AVERAGE(C1:C3)
The Python openpyxl package uses the read attribute to determine whether the filename parameter is a file-like object
During the process of reading an Excel file, openpyxl
will check whether the load_workbook
function’s parameter filename
has the attribute read
, and if it does, then the filename
parameter will be treated as a Python file-like
object, and the read
attribute will be called as a method to get data in binary format from the Excel file.
In the following code, we open the Excel file Hello.xlsx
using the open
function and use the returned object as an argument to the load_workbook
function.
# Please jump the command line to the directory where Hello.xlsx is located and then run this script file
# Import function load_workbook
from openpyxl import load_workbook
# Use the open function to open the Excel file
xlsx = open('Hello.xlsx', 'rb')
workbook = load_workbook(xlsx, data_only=True)
# Read and display the results of the formulas in cells B4 and C4
worksheet = workbook['Class 1.1']
avg_age = worksheet['B4'].value
avg_score = worksheet['C4'].value
print(f'Average {avg_age} {avg_score}')
Average 13 96.3333333333333
When the read_only parameter is set to True, the Workbook and its related objects can still be modified in some ways
Although setting the read_only
parameter of the load_workbook
function to True
causes an exception to be thrown when executing the save
method of the Excel workbook object Workbook
or writing to an Excel cell, not all modifications are infeasible (for example, the title of a worksheet can be changed), except that the changes cannot be saved.
Next, let’s open the Excel file Hello.xlsx
in read-only mode and then change the title of the worksheet.
# Please jump the command line to the directory where Hello.xlsx is located and then run this script file
# Import package openpyxl
import openpyxl
# Open an Excel file in read-only mode and modify the worksheet title
workbook = openpyxl.open('Hello.xlsx', True)
worksheet = workbook['Class 1.1']
worksheet.title = 'Class 1.2'
# ERROR Read-only Workbook objects can't modify cell values
worksheet['A1'].value = 'A good man'
AttributeError: Cell is read only
A rich_text parameter of False does not mean that all styles will be lost for the related Excel cells in the Workbook object
The fact that the load_workbook
function has a rich_text
parameter of False
does not mean that all styles will be lost for the cells in the openpyxl
Workbook
object, because some styles may not be rich text formatting and are not specific to the text blocks in the Excel cell, e.g., the text color set for the entire cell.
Write (Save) Excel Files with the Python openpyxl Package
You can save a Workbook
object as a Microsoft Excel file using the save_workbook
function of the Python module openpyxl.writer.excel
, or the save
method of the Workbook
object.
save_workbook(workbook, filename)
save(filename)
- workbook parameter
The
workbook
parameter is theWorkbook
object to be saved.- filename parameter
The
filename
parameter is the absolute or relative path to the Excel file, if the path points to a file that already exists, then it will be overwritten.
Next, we try to save the Workbook
object as file Overwrite.xlsx
again after modifying the Excel cell A1
and the original Overwrite.xlsx
will be overwritten.
# Overwrite.xlsx will be saved to the current working directory of the command line
# Import package openpyxl
import openpyxl
# Create a workbook and save it twice
workbook = openpyxl.Workbook()
workbook.save('Overwrite.xlsx')
workbook['Sheet']['A1'].value = 'Hello!'
workbook.save('Overwrite.xlsx')
A Workbook object with a write_only property of True can be saved only once
When you create a Workbook
object directly from the Python openpyxl
package and set its constructor’s write_only
parameter to True
, the created Workbook
object will be write-only (the write_only
property is True
), and for write-only Workbook
objects, you can only call their save
method or use the save_workbook
function to save once; multiple saves will result in an exception.
In the following example, we try to save the write-only Workbook
object twice.
# Import function save_workbook and class Workbook
from openpyxl.writer.excel import save_workbook
from openpyxl import Workbook
# Create the Workbook object and save it
workbook = Workbook(True)
workbook.save('New.xlsx')
# ERROR Can only be saved once
save_workbook(workbook, 'New.xlsx')
Workbook objects saved as Excel files need to have at least one visible worksheet
Whether using the save_workbook
function of the openpyxl.writer.excel
module or the save
method of the Workbook
object, the Python openpyxl
package requires that the Excel workbook object has at least one visible worksheet, i.e. at least one Worksheet
object whose sheet_state
property is 'visible'
, otherwise saving as an Excel file will not work and the exception IndexError: At least one sheet must be visible
will be thrown.
In the following code, the Excel workbook object will have two worksheets, and since they are both invisible, the save will fail.
# Import Workbook class
from openpyxl import Workbook
# Create a Workbook object and set the worksheets in it to be hidden
workbook = Workbook()
workbook['Sheet'].sheet_state = 'hidden'
workbook.create_sheet().sheet_state = 'hidden'
# ERROR Unable to save Workbook objects without visible worksheets
workbook.save('nosheet.xlsx')
IndexError: At least one sheet must be visible
Close an Excel File Opened by the Python openpyxl Package
If you get a read-only Workbook
object by setting the read_only
parameter to True
when using the load_workbook
function of the Python openpyxl
package, then calling the close
method of the Workbook
object can close the ZipFile
object related to the Excel workbook, after that, reading the cells in the Excel worksheet will not be feasible, but some information related to the worksheet itself can still be accessed, such as the name (title) of the worksheet.
The close method has no significant effect on a Workbook object with a write_only property of True
Although in the official description the close
method affects write-only Workbook
objects, this effect is not obvious (it seems to be non-existent), and you can write a row of data to a Workbook
object that has called the close
method and save it via the save
method.
There is no problem writing data before or after calling the close
method on the Workbook
object, and the save
method can be called to save the Workbook
object as an Excel file.
# Please jump the command line to the directory where Hello.xlsx is located and then run this script file
from openpyxl import Workbook, load_workbook
# Create a write-only workbook
w_workbook = Workbook(True)
w_workbook.create_sheet()
w_workbook['Sheet'].append(['Hello', 'World'])
# After calling the close method, write another row of data and save it
w_workbook.close()
w_workbook['Sheet'].append(['Hello', 'World'])
w_workbook.save('Close.xlsx')
# Create a read-only workbook
r_workbook = load_workbook('Hello.xlsx', True)
print(r_workbook['Class 1.1']['A1'].value)
# After the close method is called, cell A1 is read
r_workbook.close()
# ERROR Unable to read cell A1
print(r_workbook['Class 1.1']['A1'])
Jack
…
ValueError: Attempt to use ZIP archive that was already closed
Source Code
src/en/openpyxl/excel_files·codebeatme/office-programming·GitHub