数据结构教程¶
本教程会简单过一遍xlwings读写数据的最常用的场景和默认的方式。如果需要深入了解如何通过 options
方法控制更多的细节,请查阅 转换器及选项 。
后面所有的示例代码都依赖下面的模块导入语句:
>>> import xlwings as xw
单个单元格¶
根据单元格里面存储的是数字、字符串、空白还是日期,返回的python对象类型分别是 float
, unicode
, None
或 datetime
:
>>> import datetime as dt
>>> sheet = xw.Book().sheets[0]
>>> sheet['A1'].value = 1
>>> sheet['A1'].value
1.0
>>> sheet['A2'].value = 'Hello'
>>> sheet['A2'].value
'Hello'
>>> sheet['A3'].value is None
True
>>> sheet['A4'].value = dt.datetime(2000, 1, 1)
>>> sheet['A4'].value
datetime.datetime(2000, 1, 1, 0, 0)
列表¶
一维列表:在Excel中代表行或者列的区域,在Python中返回的都是一个列表。 所以一旦把他们读入Python中,就是丢失行、列的方向信息。 如果这个的确是个问题的话,下面一个知识点会说明如何保留这些信息:
>>> sheet = xw.Book().sheets[0] >>> sheet['A1'].value = [[1],[2],[3],[4],[5]] # Column orientation (nested list) >>> sheet['A1:A5'].value [1.0, 2.0, 3.0, 4.0, 5.0] >>> sheet['A1'].value = [1, 2, 3, 4, 5] >>> sheet['A1:E1'].value [1.0, 2.0, 3.0, 4.0, 5.0]
要把单个的单元格强制转换为列表,可以这样:
>>> sheet['A1'].options(ndim=1).value [1.0]
备注
To write a list in column orientation to Excel, use
transpose
:sheet.range('A1').options(transpose=True).value = [1,2,3,4]
二维列表:如果必须保留行列的方向信息,可以在区域操作中设置ndim 。这样会把区域信息返回为一个嵌套的列表(二维列表):
>>> sheet['A1:A5'].options(ndim=2).value [[1.0], [2.0], [3.0], [4.0], [5.0]] >>> sheet['A1:E1'].options(ndim=2).value [[1.0, 2.0, 3.0, 4.0, 5.0]]
二维区域会自动返回为嵌套列表。当把一个嵌套列表赋值给Excel区域的时候,只要明确目标区域的左上角单元格地址就行了。下面的例子也使用了索引方式把区域的值读会Python:
>>> sheet['A10'].value = [['Foo 1', 'Foo 2', 'Foo 3'], [10, 20, 30]] >>> sheet.range((10,1),(11,3)).value [['Foo 1', 'Foo 2', 'Foo 3'], [10.0, 20.0, 30.0]]
备注
Try to minimize the number of interactions with Excel. It is always more efficient to do
sheet.range('A1').value = [[1,2],[3,4]]
than sheet.range('A1').value = [1, 2]
and sheet.range('A2').value = [3, 4]
.
区域扩展¶
通过 expand
或者 options
方法中的 expand
关键词可以动态地获得区域范围。 expand
会直接给出扩展后的区域对象,而 options
方法在调用区域的是才计算区域扩展。下面例子说明了两种方法的不同之处:
>>> sheet = xw.Book().sheets[0]
>>> sheet['A1'].value = [[1,2], [3,4]]
>>> range1 = sheet['A1'].expand('table') # or just .expand()
>>> range2 = sheet['A1'].options(expand='table')
>>> range1.value
[[1.0, 2.0], [3.0, 4.0]]
>>> range2.value
[[1.0, 2.0], [3.0, 4.0]]
>>> sheet['A3'].value = [5, 6]
>>> range1.value
[[1.0, 2.0], [3.0, 4.0]]
>>> range2.value
[[1.0, 2.0], [3.0, 4.0], [5.0, 6.0]]
上面例子中的 'table'
是向 'down'
(下方)和 'right'
(右边)扩展,其他可用选项可以分别只在行或者列方向上扩展。
备注
Using expand()
together with a named Range as top left cell gives you a flexible setup in
Excel: You can move around the table and change its size without having to adjust your code, e.g. by using
something like sheet.range('NamedRange').expand().value
.
NumPy数组¶
NumPy数组和嵌套列表类似,只是表示空元素时用 nan
而不是 None
。如果要把一个区域读到数组中,需要在 options
中设置 convert=np.array
:
>>> import numpy as np
>>> sheet = xw.Book().sheets[0]
>>> sheet['A1'].value = np.eye(3)
>>> sheet['A1'].options(np.array, expand='table').value
array([[ 1., 0., 0.],
[ 0., 1., 0.],
[ 0., 0., 1.]])
Pandas数据表(DataFrame)¶
>>> sheet = xw.Book().sheets[0]
>>> df = pd.DataFrame([[1.1, 2.2], [3.3, None]], columns=['one', 'two'])
>>> df
one two
0 1.1 2.2
1 3.3 NaN
>>> sheet['A1'].value = df
>>> sheet['A1:C3'].options(pd.DataFrame).value
one two
0 1.1 2.2
1 3.3 NaN
# options: work for reading and writing
>>> sheet['A5'].options(index=False).value = df
>>> sheet['A9'].options(index=False, header=False).value = df
Pandas的序列(Serie)¶
>>> import pandas as pd
>>> import numpy as np
>>> sheet = xw.Book().sheets[0]
>>> s = pd.Series([1.1, 3.3, 5., np.nan, 6., 8.], name='myseries')
>>> s
0 1.1
1 3.3
2 5.0
3 NaN
4 6.0
5 8.0
Name: myseries, dtype: float64
>>> sheet['A1'].value = s
>>> sheet['A1:B7'].options(pd.Series).value
0 1.1
1 3.3
2 5.0
3 NaN
4 6.0
5 8.0
Name: myseries, dtype: float64
备注
You only need to specify the top left cell when writing a list, a NumPy array or a Pandas
DataFrame to Excel, e.g.: sheet['A1'].value = np.eye(10)
Chunking: Read/Write big DataFrames etc.¶
When you read and write from or to big ranges, you may have to chunk them or you will hit a timeout or a memory error. The ideal chunksize
will depend on your system and size of the array, so you will have to try out a few different chunksizes to find one that works well:
import pandas as pd
import numpy as np
sheet = xw.Book().sheets[0]
data = np.arange(75_000 * 20).reshape(75_000, 20)
df = pd.DataFrame(data=data)
sheet['A1'].options(chunksize=10_000).value = df
And the same for reading:
# As DataFrame
df = sheet['A1'].expand().options(pd.DataFrame, chunksize=10_000).value
# As list of list
df = sheet['A1'].expand().options(chunksize=10_000).value