数据结构教程#

本教程会简单过一遍xlwings读写数据的最常用的场景和默认的方式。如果需要深入了解如何通过 options 方法控制更多的细节,请查阅 转换器及选项

后面所有的示例代码都依赖下面的模块导入语句:

>>> import xlwings as xw

单个单元格#

根据单元格里面存储的是数字、字符串、空白还是日期,返回的python对象类型分别是 float, unicode, Nonedatetime :

>>> import datetime as dt
>>> sheet = xw.Book().sheets[0]
>>> sheet['A1'].value = 1
>>> sheet['A1'].value
1.0
>>> sheet['A2'].value = 'Hello'
>>> sheet['A2'].value
'Hello'
>>> sheet['A3'].value is None
True
>>> sheet['A4'].value = dt.datetime(2000, 1, 1)
>>> sheet['A4'].value
datetime.datetime(2000, 1, 1, 0, 0)

列表#

  • 一维列表:在Excel中代表行或者列的区域,在Python中返回的都是一个列表。 所以一旦把他们读入Python中,就是丢失行、列的方向信息。 如果这个的确是个问题的话,下面一个知识点会说明如何保留这些信息:

    >>> sheet = xw.Book().sheets[0]
    >>> sheet['A1'].value = [[1],[2],[3],[4],[5]]  # Column orientation (nested list)
    >>> sheet['A1:A5'].value
    [1.0, 2.0, 3.0, 4.0, 5.0]
    >>> sheet['A1'].value = [1, 2, 3, 4, 5]
    >>> sheet['A1:E1'].value
    [1.0, 2.0, 3.0, 4.0, 5.0]
    

    要把单个的单元格强制转换为列表,可以这样:

    >>> sheet['A1'].options(ndim=1).value
    [1.0]
    

    备注

    To write a list in column orientation to Excel, use transpose: sheet.range('A1').options(transpose=True).value = [1,2,3,4]

  • 二维列表:如果必须保留行列的方向信息,可以在区域操作中设置ndim 。这样会把区域信息返回为一个嵌套的列表(二维列表):

    >>> sheet['A1:A5'].options(ndim=2).value
    [[1.0], [2.0], [3.0], [4.0], [5.0]]
    >>> sheet['A1:E1'].options(ndim=2).value
    [[1.0, 2.0, 3.0, 4.0, 5.0]]
    
  • 二维区域会自动返回为嵌套列表。当把一个嵌套列表赋值给Excel区域的时候,只要明确目标区域的左上角单元格地址就行了。下面的例子也使用了索引方式把区域的值读会Python:

    >>> sheet['A10'].value = [['Foo 1', 'Foo 2', 'Foo 3'], [10, 20, 30]]
    >>> sheet.range((10,1),(11,3)).value
    [['Foo 1', 'Foo 2', 'Foo 3'], [10.0, 20.0, 30.0]]
    

备注

Try to minimize the number of interactions with Excel. It is always more efficient to do sheet.range('A1').value = [[1,2],[3,4]] than sheet.range('A1').value = [1, 2] and sheet.range('A2').value = [3, 4].

区域扩展#

通过 expand 或者 options 方法中的 expand 关键词可以动态地获得区域范围。 expand 会直接给出扩展后的区域对象,而 options 方法在调用区域的是才计算区域扩展。下面例子说明了两种方法的不同之处:

>>> sheet = xw.Book().sheets[0]
>>> sheet['A1'].value = [[1,2], [3,4]]
>>> range1 = sheet['A1'].expand('table')  # or just .expand()
>>> range2 = sheet['A1'].options(expand='table')
>>> range1.value
[[1.0, 2.0], [3.0, 4.0]]
>>> range2.value
[[1.0, 2.0], [3.0, 4.0]]
>>> sheet['A3'].value = [5, 6]
>>> range1.value
[[1.0, 2.0], [3.0, 4.0]]
>>> range2.value
[[1.0, 2.0], [3.0, 4.0], [5.0, 6.0]]

上面例子中的 'table' 是向 'down' (下方)和 'right' (右边)扩展,其他可用选项可以分别只在行或者列方向上扩展。

备注

Using expand() together with a named Range as top left cell gives you a flexible setup in Excel: You can move around the table and change its size without having to adjust your code, e.g. by using something like sheet.range('NamedRange').expand().value.

NumPy数组#

NumPy数组和嵌套列表类似,只是表示空元素时用 nan 而不是 None 。如果要把一个区域读到数组中,需要在 options 中设置 convert=np.array

>>> import numpy as np
>>> sheet = xw.Book().sheets[0]
>>> sheet['A1'].value = np.eye(3)
>>> sheet['A1'].options(np.array, expand='table').value
array([[ 1.,  0.,  0.],
       [ 0.,  1.,  0.],
       [ 0.,  0.,  1.]])

Pandas数据表(DataFrame)#

>>> sheet = xw.Book().sheets[0]
>>> df = pd.DataFrame([[1.1, 2.2], [3.3, None]], columns=['one', 'two'])
>>> df
   one  two
0  1.1  2.2
1  3.3  NaN
>>> sheet['A1'].value = df
>>> sheet['A1:C3'].options(pd.DataFrame).value
   one  two
0  1.1  2.2
1  3.3  NaN
# options: work for reading and writing
>>> sheet['A5'].options(index=False).value = df
>>> sheet['A9'].options(index=False, header=False).value = df

Pandas的序列(Serie)#

>>> import pandas as pd
>>> import numpy as np
>>> sheet = xw.Book().sheets[0]
>>> s = pd.Series([1.1, 3.3, 5., np.nan, 6., 8.], name='myseries')
>>> s
0    1.1
1    3.3
2    5.0
3    NaN
4    6.0
5    8.0
Name: myseries, dtype: float64
>>> sheet['A1'].value = s
>>> sheet['A1:B7'].options(pd.Series).value
0    1.1
1    3.3
2    5.0
3    NaN
4    6.0
5    8.0
Name: myseries, dtype: float64

备注

You only need to specify the top left cell when writing a list, a NumPy array or a Pandas DataFrame to Excel, e.g.: sheet['A1'].value = np.eye(10)

Chunking: Read/Write big DataFrames etc.#

When you read and write from or to big ranges, you may have to chunk them or you will hit a timeout or a memory error. The ideal chunksize will depend on your system and size of the array, so you will have to try out a few different chunksizes to find one that works well:

import pandas as pd
import numpy as np
sheet = xw.Book().sheets[0]
data = np.arange(75_000 * 20).reshape(75_000, 20)
df = pd.DataFrame(data=data)
sheet['A1'].options(chunksize=10_000).value = df

And the same for reading:

# As DataFrame
df = sheet['A1'].expand().options(pd.DataFrame, chunksize=10_000).value
# As list of list
df = sheet['A1'].expand().options(chunksize=10_000).value