数据结构教程

本教程会简单过一遍xlwings读写数据的最常用的场景和默认的方式。如果需要深入了解如何通过 options 方法控制更多的细节,请查阅 转换器及选项

后面所有的示例代码都依赖下面的模块导入语句:

>>> import xlwings as xw

单个单元格

根据单元格里面存储的是数字、字符串、空白还是日期,返回的python对象类型分别是 float, unicode, Nonedatetime :

>>> import datetime as dt
>>> sheet = xw.Book().sheets[0]
>>> sheet['A1'].value = 1
>>> sheet['A1'].value
1.0
>>> sheet['A2'].value = 'Hello'
>>> sheet['A2'].value
'Hello'
>>> sheet['A3'].value is None
True
>>> sheet['A4'].value = dt.datetime(2000, 1, 1)
>>> sheet['A4'].value
datetime.datetime(2000, 1, 1, 0, 0)

列表

  • 一维列表:在Excel中代表行或者列的区域,在Python中返回的都是一个列表。 所以一旦把他们读入Python中,就是丢失行、列的方向信息。 如果这个的确是个问题的话,下面一个知识点会说明如何保留这些信息:

    >>> sheet = xw.Book().sheets[0]
    >>> sheet['A1'].value = [[1],[2],[3],[4],[5]]  # Column orientation (nested list)
    >>> sheet['A1:A5'].value
    [1.0, 2.0, 3.0, 4.0, 5.0]
    >>> sheet['A1'].value = [1, 2, 3, 4, 5]
    >>> sheet['A1:E1'].value
    [1.0, 2.0, 3.0, 4.0, 5.0]
    

    要把单个的单元格强制转换为列表,可以这样:

    >>> sheet['A1'].options(ndim=1).value
    [1.0]
    

    备注

    To write a list in column orientation to Excel, use transpose: sheet.range('A1').options(transpose=True).value = [1,2,3,4]

  • 二维列表:如果必须保留行列的方向信息,可以在区域操作中设置ndim 。这样会把区域信息返回为一个嵌套的列表(二维列表):

    >>> sheet['A1:A5'].options(ndim=2).value
    [[1.0], [2.0], [3.0], [4.0], [5.0]]
    >>> sheet['A1:E1'].options(ndim=2).value
    [[1.0, 2.0, 3.0, 4.0, 5.0]]
    
  • 二维区域会自动返回为嵌套列表。当把一个嵌套列表赋值给Excel区域的时候,只要明确目标区域的左上角单元格地址就行了。下面的例子也使用了索引方式把区域的值读会Python:

    >>> sheet['A10'].value = [['Foo 1', 'Foo 2', 'Foo 3'], [10, 20, 30]]
    >>> sheet.range((10,1),(11,3)).value
    [['Foo 1', 'Foo 2', 'Foo 3'], [10.0, 20.0, 30.0]]
    

备注

Try to minimize the number of interactions with Excel. It is always more efficient to do sheet.range('A1').value = [[1,2],[3,4]] than sheet.range('A1').value = [1, 2] and sheet.range('A2').value = [3, 4].

区域扩展

通过 expand 或者 options 方法中的 expand 关键词可以动态地获得区域范围。 expand 会直接给出扩展后的区域对象,而 options 方法在调用区域的是才计算区域扩展。下面例子说明了两种方法的不同之处:

>>> sheet = xw.Book().sheets[0]
>>> sheet['A1'].value = [[1,2], [3,4]]
>>> range1 = sheet['A1'].expand('table')  # or just .expand()
>>> range2 = sheet['A1'].options(expand='table')
>>> range1.value
[[1.0, 2.0], [3.0, 4.0]]
>>> range2.value
[[1.0, 2.0], [3.0, 4.0]]
>>> sheet['A3'].value = [5, 6]
>>> range1.value
[[1.0, 2.0], [3.0, 4.0]]
>>> range2.value
[[1.0, 2.0], [3.0, 4.0], [5.0, 6.0]]

上面例子中的 'table' 是向 'down' (下方)和 'right' (右边)扩展,其他可用选项可以分别只在行或者列方向上扩展。

备注

Using expand() together with a named Range as top left cell gives you a flexible setup in Excel: You can move around the table and change its size without having to adjust your code, e.g. by using something like sheet.range('NamedRange').expand().value.

NumPy数组

NumPy数组和嵌套列表类似,只是表示空元素时用 nan 而不是 None 。如果要把一个区域读到数组中,需要在 options 中设置 convert=np.array

>>> import numpy as np
>>> sheet = xw.Book().sheets[0]
>>> sheet['A1'].value = np.eye(3)
>>> sheet['A1'].options(np.array, expand='table').value
array([[ 1.,  0.,  0.],
       [ 0.,  1.,  0.],
       [ 0.,  0.,  1.]])

Pandas数据表(DataFrame)

>>> sheet = xw.Book().sheets[0]
>>> df = pd.DataFrame([[1.1, 2.2], [3.3, None]], columns=['one', 'two'])
>>> df
   one  two
0  1.1  2.2
1  3.3  NaN
>>> sheet['A1'].value = df
>>> sheet['A1:C3'].options(pd.DataFrame).value
   one  two
0  1.1  2.2
1  3.3  NaN
# options: work for reading and writing
>>> sheet['A5'].options(index=False).value = df
>>> sheet['A9'].options(index=False, header=False).value = df

Pandas的序列(Serie)

>>> import pandas as pd
>>> import numpy as np
>>> sheet = xw.Book().sheets[0]
>>> s = pd.Series([1.1, 3.3, 5., np.nan, 6., 8.], name='myseries')
>>> s
0    1.1
1    3.3
2    5.0
3    NaN
4    6.0
5    8.0
Name: myseries, dtype: float64
>>> sheet['A1'].value = s
>>> sheet['A1:B7'].options(pd.Series).value
0    1.1
1    3.3
2    5.0
3    NaN
4    6.0
5    8.0
Name: myseries, dtype: float64

备注

You only need to specify the top left cell when writing a list, a NumPy array or a Pandas DataFrame to Excel, e.g.: sheet['A1'].value = np.eye(10)

Chunking: Read/Write big DataFrames etc.

When you read and write from or to big ranges, you may have to chunk them or you will hit a timeout or a memory error. The ideal chunksize will depend on your system and size of the array, so you will have to try out a few different chunksizes to find one that works well:

import pandas as pd
import numpy as np
sheet = xw.Book().sheets[0]
data = np.arange(75_000 * 20).reshape(75_000, 20)
df = pd.DataFrame(data=data)
sheet['A1'].options(chunksize=10_000).value = df

And the same for reading:

# As DataFrame
df = sheet['A1'].expand().options(pd.DataFrame, chunksize=10_000).value
# As list of list
df = sheet['A1'].expand().options(chunksize=10_000).value