转换器及选项

转换器是从v0.7.0开始引入的,它定义了在 读出写入 操作中,Excel的区域及其取值是如何被转换的。同时在 xlwings.Range 对象和 User Defined Functions (UDFs) 之间保持了一致性。

Converters are explicitly set in the options method when manipulating Range objects or in the @xw.arg and @xw.ret decorators when using UDFs. If no converter is specified, the default converter is applied when reading. When writing, xlwings will automatically apply the correct converter (if available) according to the object's type that is being written to Excel. If no converter is found for that type, it falls back to the default converter.

后面所有例子依赖下面的导入:

>>> import xlwings as xw

语法:

Action

Range objects

UDFs

myrange.options(convert=None, **kwargs).value

@arg('x', convert=None, **kwargs)

writing

myrange.options(convert=None, **kwargs).value = myvalue

@ret(convert=None, **kwargs)

备注

关键词参数(kwargs)可以是特定转换器的参数,也可以是缺省转换器的参数。例如,为了在缺省转换器中设置 numbers 选项并在 DataFrame 转换器中设置 index 选项,可以这样写:

myrange.options(pd.DataFrame, index=False, numbers=int).value

缺省转换器

如果没有设定选项,执行下面的转换器:

  • 在读单元格时,如果里面有数字,转换为 floats ,如果里面是文字,转换为 unicode 字符串,如果里面是日期,转换为 datetime ,如果是空的,返回 None

  • 行/列被作为列表读入,例如: [None, 1.0, 'a string']

  • 2维区域作为嵌套列表读入,例如: [[None, 1.0, 'a string'], [None, 2.0, 'another string']]

下面选项可以设定:

ndim

不考虑区域的形状,强制返回值为1维或2维列表:

>>> import xlwings as xw
>>> sheet = xw.Book().sheets[0]
>>> sheet['A1'].value = [[1, 2], [3, 4]]
>>> sheet['A1'].value
1.0
>>> sheet['A1'].options(ndim=1).value
[1.0]
>>> sheet['A1'].options(ndim=2).value
[[1.0]]
>>> sheet['A1:A2'].value
[1.0 3.0]
>>> sheet['A1:A2'].options(ndim=2).value
[[1.0], [3.0]]

numbers

单元格里的数字,缺省作为 float 类型,不过也可以转换为 int 类型:

>>> sheet['A1'].value = 1
>>> sheet['A1'].value
1.0
>>> sheet['A1'].options(numbers=int).value
1

或者,可以指定为其他函数或类型,只要它们关键字参数和float类型的相同即可。

用在UDF中时是这样的:

@xw.func
@xw.arg('x', numbers=int)
def myfunction(x):
    # all numbers in x arrive as int
    return x

备注

Excel delivers all numbers as floats in the interactive mode, which is the reason why the int converter rounds numbers first before turning them into integers. Otherwise it could happen that e.g., 5 might be returned as 4 in case it is represented as a floating point number that is slightly smaller than 5. Should you require Python's original int in your converter, use raw int` instead.

dates

单元格内的日期缺省被读作 datetime.datetime 类型,不过也可以改为 datetime.date 类型:

  • Range:

    >>> import datetime as dt
    >>> sheet['A1'].options(dates=dt.date).value
    
  • UDFs (decorator):

    @xw.arg('x', dates=dt.date)
    

或者,可以指定为其他函数或类型,只要它们的关键字参数和 datetime.datetime 类型的相同即可。例如:

>>> my_date_handler = lambda year, month, day, **kwargs: "%04i-%02i-%02i" % (year, month, day)
>>> sheet['A1'].options(dates=my_date_handler).value
'2017-02-20'

empty

每个空白的单元格缺省会被转换成 None ,可以像下面一样改变:

  • Range:

>>> sheet['A1'].options(empty='NA').value
  • UDFs (decorator):

    @xw.arg('x', empty='NA')
    

transpose

在读写时都可以转置,比如,我们可以把一个列表在Excel中写为一列:

  • Range: sheet['A1'].options(transpose=True).value = [1, 2, 3]

  • UDFs:

    @xw.arg('x', transpose=True)
    @xw.ret(transpose=True)
    def myfunction(x):
        # x will be returned unchanged as transposed both when reading and writing
        return x
    

expand

它的功能和区域的 table , verticalhorizontal 属性一样,只是在区域取值的时候才去求值:

>>> import xlwings as xw
>>> sheet = xw.Book().sheets[0]
>>> sheet['A1'].value = [[1,2], [3,4]]
>>> range1 = sheet['A1'].expand()
>>> range2 = sheet['A1'].options(expand='table')
>>> range1.value
[[1.0, 2.0], [3.0, 4.0]]
>>> range2.value
[[1.0, 2.0], [3.0, 4.0]]
>>> sheet['A3'].value = [5, 6]
>>> range1.value
[[1.0, 2.0], [3.0, 4.0]]
>>> range2.value
[[1.0, 2.0], [3.0, 4.0], [5.0, 6.0]]

备注

expand 方法仅用于 Range 对象,UDF只允许对调用的单元格操作。

chunksize

When you read and write from or to big ranges, you may have to chunk them or you will hit a timeout or a memory error. The ideal chunksize will depend on your system and size of the array, so you will have to try out a few different chunksizes to find one that works well:

import pandas as pd
import numpy as np
sheet = xw.Book().sheets[0]
data = np.arange(75_000 * 20).reshape(75_000, 20)
df = pd.DataFrame(data=data)
sheet['A1'].options(chunksize=10_000).value = df

And the same for reading:

# As DataFrame
df = sheet['A1'].expand().options(pd.DataFrame, chunksize=10_000).value
# As list of list
df = sheet['A1'].expand().options(chunksize=10_000).value

err_to_str

Added in version 0.28.0.

If True, will include cell errors such as #N/A as strings. By default, they will be converted to None.

formatter

Added in version 0.28.1.

备注

You can't use formatters with Excel tables.

The formatter option accepts the name of a function. The function will be called after writing the values to Excel and allows you to easily style the range in a very flexible way. How it works is best shown with a little example:

import pandas as pd
import xlwings as xw

sheet = xw.Book().sheets[0]

def table(rng: xw.Range, df: pd.DataFrame):
    """This is the formatter function"""
    # Header
    rng[0, :].color = "#A9D08E"

    # Rows
    for ix, row in enumerate(rng.rows[1:]):
        if ix % 2 == 0:
            row.color = "#D0CECE"  # Even rows

    # Columns
    for ix, col in enumerate(df.columns):
        if "two" in col:
            rng[1:, ix].number_format = "0.0%"


df = pd.DataFrame(data={"one": [1, 2, 3, 4], "two": [5, 6, 7, 8]})
sheet["A1"].options(formatter=table, index=False).value = df

Running this code will format the DataFrame like this:

_images/formatter.png

The formatter's signature is: def myformatter(myrange, myvalues) where myrange corresponds to the range where myvalues are written to. myvalues is simply what you assign to the value property in the last line of the example. Since we're using this with a DataFrame, it makes sense to name the argument accordingly and using type hints will help your editor with auto-completion. If you would use a nested list instead of a DataFrame, you would write something like this instead:

def table(rng: xw.Range, values: list[list]):

内置转换器

xlwings提供了几种内置转换器,用于 dictionaries (字典), NumPy arrays**(NumPy数组), **Pandas Series**(Pandas序列)和 **DataFrames 的转换。它们都是基于缺省服务器的,所以在大多数情况下,上面提到的选项也同样可以用(除非有些选项在某种情况下没有意义,比如ndim对字典来说是没有意义的)。

It is also possible to write and register a custom converter for additional types, see below.

下面的例子适用于 xlwings.Range 对象和UDF,虽然有的只给出了一种例子

字典转换器

字典转换器把Excel中的两列转换成一个字典。如果数据是按行排的,请使用 transpose 选项:

_images/dict_converter.png
>>> sheet = xw.sheets.active
>>> sheet['A1:B2'].options(dict).value
{'a': 1.0, 'b': 2.0}
>>> sheet['A4:B5'].options(dict, transpose=True).value
{'a': 1.0, 'b': 2.0}

注意:除了 dict ,还可以用 collections 中的 OrderedDict

Numpy数组转换器

options: dtype=None, copy=True, order=None, ndim=None

前3个选项的作用和直接使用 np.array() 时一致。 ndim 的作用和上面用在列表上(默认转换器的情况下)的作用一致,返回标量、1维数组或2维数组。

Example

>>> import numpy as np
>>> sheet = xw.Book().sheets[0]
>>> sheet['A1'].options(transpose=True).value = np.array([1, 2, 3])
>>> sheet['A1:A3'].options(np.array, ndim=2).value
array([[ 1.],
       [ 2.],
       [ 3.]])

Pandas序列转换器

options: dtype=None, copy=False, index=1, header=True

前2个选项的作用和直接使用 pd.Series() 时一致。 Pandas 序列的返回值一直是一列,所以 ndim 不起作用。

index: 整数值或布尔值
读的时候,认为它是Excel种的索引列的列数。
写的时候,如果需要包含索引,就设为 True ,否则就设为 False
header: 布尔型
读的时候,如果Excel种没有索引或序列名,就设为 False
写的时候,如果需要包含索引和序列名,就设为 True ,否则设为 False

indexheader , 1True 是等价的。

示例:

_images/series_conv.png
>>> sheet = xw.Book().sheets[0]
>>> s = sheet['A1'].options(pd.Series, expand='table').value
>>> s
date
2001-01-01    1
2001-01-02    2
2001-01-03    3
2001-01-04    4
2001-01-05    5
2001-01-06    6
Name: series name, dtype: float64

Pandas DataFrame转换器

options: dtype=None, copy=False, index=1, header=1

前2个选项的作用和直接使用 pd.DataFrame() 时一致。 ndim 不起作用,因为把数据读入到Pandas DataFrame时自动设定 ndim=2

index: 整数值或布尔值
读的时候,认为它是Excel种的索引列的列数。
写的时候,如果需要包含索引,就设为 True ,否则就设为 False
header: 整数型或布尔型
读数据时,这个值应该是Excel表中表头的行数。
写的时候,如果需要包含索引和序列名,就设为 True ,否则设为 False

indexheader , 1True 是等价的。

示例:

_images/df_converter.png
>>> sheet = xw.Book().sheets[0]
>>> df = sheet['A1:D5'].options(pd.DataFrame, header=2).value
>>> df
    a     b
    c  d  e
ix
10  1  2  3
20  4  5  6
30  7  8  9

# Writing back using the defaults:
>>> sheet['A1'].value = df

# Writing back and changing some of the options, e.g. getting rid of the index:
>>> sheet['B7'].options(index=False).value = df

The same sample for UDF (starting in cell A13 on screenshot) looks like this:

@xw.func
@xw.arg('x', pd.DataFrame, header=2)
@xw.ret(index=False)
def myfunction(x):
   # x is a DataFrame, do something with it
   return x

xw.Range和‘裸’转换器

从技术层面上说,就是"没有转换"。

  • 如果需要直接得到 xlwings.Range ,可以这样做:

    @xw.func
    @xw.arg('x', 'range')
    def myfunction(x):
       return x.formula
    

    上面函数把 xlwings.Range 对象作为x返回,也就是说没有做任何转换或设置。

  • 这个 转换器从底层库里 (Windows系统里是 pywin32 ,Mac系统里是 appscript )把值原封不动地输出来,也就是说,没有做任何的处理或者跨平台兼容。 这在一些要求效率的场合中很有用。例如:

    >>> sheet['A1:B2'].value
    [[1.0, 'text'], [datetime.datetime(2016, 2, 1, 0, 0), None]]
    
    >>> sheet['A1:B2'].options('raw').value  # or sheet['A1:B2'].raw_value
    ((1.0, 'text'), (pywintypes.datetime(2016, 2, 1, 0, 0, tzinfo=TimeZoneInfo('GMT Standard Time', True)), None))
    

自定义转换器

实现自定义转换器的步骤如下:

  • 继承 xlwings.conversion.Converter

  • 以静态方法或者类方法实现 read_valuewrite_value 方法:

    • read_value 中, value 就是基转换器(Base converter)返回的值,因此如果没有指定 base 转换器,返回的就是经过缺省转换器转换出来的值。

    • write_value 中, value 是需要往Excel里面写的原始对象。需要返回基转换器所需格式。如果没有指定 base 转换器,还是会使用缺省转换器。

    The options dictionary will contain all keyword arguments specified in the options method, e.g. when calling myrange.options(myoption='some value') or as specified in the @arg and @ret decorator when using UDFs. Here is the basic structure:

    from xlwings.conversion import Converter
    
    class MyConverter(Converter):
    
        @staticmethod
        def read_value(value, options):
            myoption = options.get('myoption', default_value)
            return_value = value  # Implement your conversion here
            return return_value
    
        @staticmethod
        def write_value(value, options):
            myoption = options.get('myoption', default_value)
            return_value = value  # Implement your conversion here
            return return_value
    
  • Optional: set a base converter (base expects a class name) to build on top of an existing converter, e.g. for the built-in ones: DictConverter, NumpyArrayConverter, PandasDataFrameConverter, PandasSeriesConverter

  • 可选项:注册转换器:可以 (a) 把自己的转换器注册为一个类型在写入操作时的默认转换器;以及/或者 (b) 可以为自定义的转换器起一个别名,以便在调用转化器的时候可以用别名替代类名。

下面例子比较容易模仿,它定义了一个DataFrame转换器,扩展了内置转换器的功能,可以支持删除空值:

from xlwings.conversion import Converter, PandasDataFrameConverter

class DataFrameDropna(Converter):

    base = PandasDataFrameConverter

    @staticmethod
    def read_value(builtin_df, options):
        dropna = options.get('dropna', False)  # set default to False
        if dropna:
            converted_df = builtin_df.dropna()
        else:
            converted_df = builtin_df
        # This will arrive in Python when using the DataFrameDropna converter for reading
        return converted_df

    @staticmethod
    def write_value(df, options):
        dropna = options.get('dropna', False)
        if dropna:
            converted_df = df.dropna()
        else:
            converted_df = df
        # This will be passed to the built-in PandasDataFrameConverter when writing
        return converted_df

现在来看看如何应用这些不同的转换器:

# Fire up a Workbook and create a sample DataFrame
sheet = xw.Book().sheets[0]
df = pd.DataFrame([[1.,10.],[2.,np.nan], [3., 30.]])
  • DataFrames缺省转换器:

    # Write
    sheet['A1'].value = df
    
    # Read
    sheet['A1:C4'].options(pd.DataFrame).value
    
  • DataFrameDropna转换器:

    # Write
    sheet['A7'].options(DataFrameDropna, dropna=True).value = df
    
    # Read
    sheet['A1:C4'].options(DataFrameDropna, dropna=True).value
    
  • 注册别名(可选):

    DataFrameDropna.register('df_dropna')
    
    # Write
    sheet['A12'].options('df_dropna', dropna=True).value = df
    
    # Read
    sheet['A1:C4'].options('df_dropna', dropna=True).value
    
  • 把DataFrameDropna注册为DataFrames的缺省转换器(可选):

    DataFrameDropna.register(pd.DataFrame)
    
    # Write
    sheet['A13'].options(dropna=True).value = df
    
    # Read
    sheet['A1:C4'].options(pd.DataFrame, dropna=True).value
    

上面示例同样适用于UDF,例如:

@xw.func
@arg('x', DataFrameDropna, dropna=True)
@ret(DataFrameDropna, dropna=True)
def myfunction(x):
    # ...
    return x

备注

Python对象被写入Excel的时候,会在转换流水线中经过多个不同的阶段。Excel/COM对象被读进Python的时候也同样如此。

Pipelines are internally defined by Accessor classes. A Converter is just a special Accessor which converts to/from a particular type by adding an extra stage to the pipeline of the default Accessor. For example, the PandasDataFrameConverter defines how a list of lists (as delivered by the default Accessor) should be turned into a Pandas DataFrame.

Converter (转换器)类提供了一套基础框架使得写新的转换器变得容易。如果需要控制更多的东西,可以直接编写 Accessor 的子类,不过这部分内容需要做更多的工作并且目前没有什么文档。