转换器及选项¶
转换器是从v0.7.0开始引入的,它定义了在 读出 和 写入 操作中,Excel的区域及其取值是如何被转换的。同时在 xlwings.Range 对象和 User Defined Functions (UDFs) 之间保持了一致性。
Converters are explicitly set in the options
method when manipulating Range
objects
or in the @xw.arg
and @xw.ret
decorators when using UDFs. If no converter is specified, the default converter
is applied when reading. When writing, xlwings will automatically apply the correct converter (if available) according to the
object's type that is being written to Excel. If no converter is found for that type, it falls back to the default converter.
后面所有例子依赖下面的导入:
>>> import xlwings as xw
语法:
Action |
Range objects |
UDFs |
---|---|---|
读 |
|
|
writing |
|
|
备注
关键词参数(kwargs
)可以是特定转换器的参数,也可以是缺省转换器的参数。例如,为了在缺省转换器中设置 numbers
选项并在 DataFrame
转换器中设置 index
选项,可以这样写:
myrange.options(pd.DataFrame, index=False, numbers=int).value
缺省转换器¶
如果没有设定选项,执行下面的转换器:
在读单元格时,如果里面有数字,转换为
floats
,如果里面是文字,转换为unicode
字符串,如果里面是日期,转换为datetime
,如果是空的,返回None
。行/列被作为列表读入,例如:
[None, 1.0, 'a string']
2维区域作为嵌套列表读入,例如:
[[None, 1.0, 'a string'], [None, 2.0, 'another string']]
下面选项可以设定:
ndim¶
不考虑区域的形状,强制返回值为1维或2维列表:
>>> import xlwings as xw
>>> sheet = xw.Book().sheets[0]
>>> sheet['A1'].value = [[1, 2], [3, 4]]
>>> sheet['A1'].value
1.0
>>> sheet['A1'].options(ndim=1).value
[1.0]
>>> sheet['A1'].options(ndim=2).value
[[1.0]]
>>> sheet['A1:A2'].value
[1.0 3.0]
>>> sheet['A1:A2'].options(ndim=2).value
[[1.0], [3.0]]
numbers¶
单元格里的数字,缺省作为 float
类型,不过也可以转换为 int
类型:
>>> sheet['A1'].value = 1
>>> sheet['A1'].value
1.0
>>> sheet['A1'].options(numbers=int).value
1
或者,可以指定为其他函数或类型,只要它们关键字参数和float类型的相同即可。
用在UDF中时是这样的:
@xw.func
@xw.arg('x', numbers=int)
def myfunction(x):
# all numbers in x arrive as int
return x
备注
Excel delivers all numbers as floats in the interactive mode, which is the reason why the int
converter rounds numbers first before turning them into integers. Otherwise it could happen that e.g., 5 might be returned as 4 in case it is represented as a floating point number that is slightly smaller than 5. Should you require Python's original int
in your converter, use raw int` instead.
dates¶
单元格内的日期缺省被读作 datetime.datetime
类型,不过也可以改为 datetime.date
类型:
Range:
>>> import datetime as dt >>> sheet['A1'].options(dates=dt.date).value
UDFs (decorator):
@xw.arg('x', dates=dt.date)
或者,可以指定为其他函数或类型,只要它们的关键字参数和 datetime.datetime
类型的相同即可。例如:
>>> my_date_handler = lambda year, month, day, **kwargs: "%04i-%02i-%02i" % (year, month, day)
>>> sheet['A1'].options(dates=my_date_handler).value
'2017-02-20'
empty¶
每个空白的单元格缺省会被转换成 None
,可以像下面一样改变:
Range:
>>> sheet['A1'].options(empty='NA').value
UDFs (decorator):
@xw.arg('x', empty='NA')
transpose¶
在读写时都可以转置,比如,我们可以把一个列表在Excel中写为一列:
Range:
sheet['A1'].options(transpose=True).value = [1, 2, 3]
UDFs:
@xw.arg('x', transpose=True) @xw.ret(transpose=True) def myfunction(x): # x will be returned unchanged as transposed both when reading and writing return x
expand¶
它的功能和区域的 table
, vertical
及 horizontal
属性一样,只是在区域取值的时候才去求值:
>>> import xlwings as xw
>>> sheet = xw.Book().sheets[0]
>>> sheet['A1'].value = [[1,2], [3,4]]
>>> range1 = sheet['A1'].expand()
>>> range2 = sheet['A1'].options(expand='table')
>>> range1.value
[[1.0, 2.0], [3.0, 4.0]]
>>> range2.value
[[1.0, 2.0], [3.0, 4.0]]
>>> sheet['A3'].value = [5, 6]
>>> range1.value
[[1.0, 2.0], [3.0, 4.0]]
>>> range2.value
[[1.0, 2.0], [3.0, 4.0], [5.0, 6.0]]
备注
expand
方法仅用于 Range
对象,UDF只允许对调用的单元格操作。
chunksize¶
When you read and write from or to big ranges, you may have to chunk them or you will hit a timeout or a memory error. The ideal chunksize
will depend on your system and size of the array, so you will have to try out a few different chunksizes to find one that works well:
import pandas as pd
import numpy as np
sheet = xw.Book().sheets[0]
data = np.arange(75_000 * 20).reshape(75_000, 20)
df = pd.DataFrame(data=data)
sheet['A1'].options(chunksize=10_000).value = df
And the same for reading:
# As DataFrame
df = sheet['A1'].expand().options(pd.DataFrame, chunksize=10_000).value
# As list of list
df = sheet['A1'].expand().options(chunksize=10_000).value
err_to_str¶
Added in version 0.28.0.
If True
, will include cell errors such as #N/A
as strings. By default, they
will be converted to None
.
formatter¶
Added in version 0.28.1.
备注
You can't use formatters with Excel tables.
The formatter
option accepts the name of a function. The function will be called after writing the values to Excel and allows you to easily style the range in a very flexible way. How it works is best shown with a little example:
import pandas as pd
import xlwings as xw
sheet = xw.Book().sheets[0]
def table(rng: xw.Range, df: pd.DataFrame):
"""This is the formatter function"""
# Header
rng[0, :].color = "#A9D08E"
# Rows
for ix, row in enumerate(rng.rows[1:]):
if ix % 2 == 0:
row.color = "#D0CECE" # Even rows
# Columns
for ix, col in enumerate(df.columns):
if "two" in col:
rng[1:, ix].number_format = "0.0%"
df = pd.DataFrame(data={"one": [1, 2, 3, 4], "two": [5, 6, 7, 8]})
sheet["A1"].options(formatter=table, index=False).value = df
Running this code will format the DataFrame like this:
The formatter's signature is: def myformatter(myrange, myvalues)
where myrange
corresponds to the range where myvalues
are written to. myvalues
is simply what you assign to the value
property in the last line of the example. Since we're using this with a DataFrame, it makes sense to name the argument accordingly and using type hints will help your editor with auto-completion. If you would use a nested list instead of a DataFrame, you would write something like this instead:
def table(rng: xw.Range, values: list[list]):
内置转换器¶
xlwings提供了几种内置转换器,用于 dictionaries (字典), NumPy arrays**(NumPy数组), **Pandas Series**(Pandas序列)和 **DataFrames 的转换。它们都是基于缺省服务器的,所以在大多数情况下,上面提到的选项也同样可以用(除非有些选项在某种情况下没有意义,比如ndim对字典来说是没有意义的)。
It is also possible to write and register a custom converter for additional types, see below.
下面的例子适用于 xlwings.Range
对象和UDF,虽然有的只给出了一种例子
字典转换器¶
字典转换器把Excel中的两列转换成一个字典。如果数据是按行排的,请使用 transpose
选项:
>>> sheet = xw.sheets.active
>>> sheet['A1:B2'].options(dict).value
{'a': 1.0, 'b': 2.0}
>>> sheet['A4:B5'].options(dict, transpose=True).value
{'a': 1.0, 'b': 2.0}
注意:除了 dict
,还可以用 collections
中的 OrderedDict
。
Numpy数组转换器¶
options: dtype=None, copy=True, order=None, ndim=None
前3个选项的作用和直接使用 np.array()
时一致。 ndim
的作用和上面用在列表上(默认转换器的情况下)的作用一致,返回标量、1维数组或2维数组。
Example
>>> import numpy as np
>>> sheet = xw.Book().sheets[0]
>>> sheet['A1'].options(transpose=True).value = np.array([1, 2, 3])
>>> sheet['A1:A3'].options(np.array, ndim=2).value
array([[ 1.],
[ 2.],
[ 3.]])
Pandas序列转换器¶
options: dtype=None, copy=False, index=1, header=True
前2个选项的作用和直接使用 pd.Series()
时一致。 Pandas
序列的返回值一直是一列,所以 ndim
不起作用。
index
: 整数值或布尔值- 读的时候,认为它是Excel种的索引列的列数。写的时候,如果需要包含索引,就设为
True
,否则就设为False
。 header
: 布尔型- 读的时候,如果Excel种没有索引或序列名,就设为
False
。写的时候,如果需要包含索引和序列名,就设为True
,否则设为False
。
对 index
和 header
, 1
和 True
是等价的。
示例:
>>> sheet = xw.Book().sheets[0]
>>> s = sheet['A1'].options(pd.Series, expand='table').value
>>> s
date
2001-01-01 1
2001-01-02 2
2001-01-03 3
2001-01-04 4
2001-01-05 5
2001-01-06 6
Name: series name, dtype: float64
Pandas DataFrame转换器¶
options: dtype=None, copy=False, index=1, header=1
前2个选项的作用和直接使用 pd.DataFrame()
时一致。 ndim
不起作用,因为把数据读入到Pandas DataFrame时自动设定 ndim=2
。
index
: 整数值或布尔值- 读的时候,认为它是Excel种的索引列的列数。写的时候,如果需要包含索引,就设为
True
,否则就设为False
。 header
: 整数型或布尔型- 读数据时,这个值应该是Excel表中表头的行数。写的时候,如果需要包含索引和序列名,就设为
True
,否则设为False
。
对 index
和 header
, 1
和 True
是等价的。
示例:
>>> sheet = xw.Book().sheets[0]
>>> df = sheet['A1:D5'].options(pd.DataFrame, header=2).value
>>> df
a b
c d e
ix
10 1 2 3
20 4 5 6
30 7 8 9
# Writing back using the defaults:
>>> sheet['A1'].value = df
# Writing back and changing some of the options, e.g. getting rid of the index:
>>> sheet['B7'].options(index=False).value = df
The same sample for UDF (starting in cell A13
on screenshot) looks like this:
@xw.func
@xw.arg('x', pd.DataFrame, header=2)
@xw.ret(index=False)
def myfunction(x):
# x is a DataFrame, do something with it
return x
xw.Range和‘裸’转换器¶
从技术层面上说,就是"没有转换"。
如果需要直接得到
xlwings.Range
,可以这样做:@xw.func @xw.arg('x', 'range') def myfunction(x): return x.formula
上面函数把
xlwings.Range
对象作为x返回,也就是说没有做任何转换或设置。这个
裸
转换器从底层库里 (Windows系统里是pywin32
,Mac系统里是appscript
)把值原封不动地输出来,也就是说,没有做任何的处理或者跨平台兼容。 这在一些要求效率的场合中很有用。例如:>>> sheet['A1:B2'].value [[1.0, 'text'], [datetime.datetime(2016, 2, 1, 0, 0), None]] >>> sheet['A1:B2'].options('raw').value # or sheet['A1:B2'].raw_value ((1.0, 'text'), (pywintypes.datetime(2016, 2, 1, 0, 0, tzinfo=TimeZoneInfo('GMT Standard Time', True)), None))
自定义转换器¶
实现自定义转换器的步骤如下:
继承
xlwings.conversion.Converter
类以静态方法或者类方法实现
read_value
和write_value
方法:在
read_value
中,value
就是基转换器(Base converter)返回的值,因此如果没有指定base
转换器,返回的就是经过缺省转换器转换出来的值。在
write_value
中,value
是需要往Excel里面写的原始对象。需要返回基转换器所需格式。如果没有指定base
转换器,还是会使用缺省转换器。
The
options
dictionary will contain all keyword arguments specified in theoptions
method, e.g. when callingmyrange.options(myoption='some value')
or as specified in the@arg
and@ret
decorator when using UDFs. Here is the basic structure:from xlwings.conversion import Converter class MyConverter(Converter): @staticmethod def read_value(value, options): myoption = options.get('myoption', default_value) return_value = value # Implement your conversion here return return_value @staticmethod def write_value(value, options): myoption = options.get('myoption', default_value) return_value = value # Implement your conversion here return return_value
Optional: set a
base
converter (base
expects a class name) to build on top of an existing converter, e.g. for the built-in ones:DictConverter
,NumpyArrayConverter
,PandasDataFrameConverter
,PandasSeriesConverter
可选项:注册转换器:可以 (a) 把自己的转换器注册为一个类型在写入操作时的默认转换器;以及/或者 (b) 可以为自定义的转换器起一个别名,以便在调用转化器的时候可以用别名替代类名。
下面例子比较容易模仿,它定义了一个DataFrame转换器,扩展了内置转换器的功能,可以支持删除空值:
from xlwings.conversion import Converter, PandasDataFrameConverter
class DataFrameDropna(Converter):
base = PandasDataFrameConverter
@staticmethod
def read_value(builtin_df, options):
dropna = options.get('dropna', False) # set default to False
if dropna:
converted_df = builtin_df.dropna()
else:
converted_df = builtin_df
# This will arrive in Python when using the DataFrameDropna converter for reading
return converted_df
@staticmethod
def write_value(df, options):
dropna = options.get('dropna', False)
if dropna:
converted_df = df.dropna()
else:
converted_df = df
# This will be passed to the built-in PandasDataFrameConverter when writing
return converted_df
现在来看看如何应用这些不同的转换器:
# Fire up a Workbook and create a sample DataFrame
sheet = xw.Book().sheets[0]
df = pd.DataFrame([[1.,10.],[2.,np.nan], [3., 30.]])
DataFrames缺省转换器:
# Write sheet['A1'].value = df # Read sheet['A1:C4'].options(pd.DataFrame).value
DataFrameDropna转换器:
# Write sheet['A7'].options(DataFrameDropna, dropna=True).value = df # Read sheet['A1:C4'].options(DataFrameDropna, dropna=True).value
注册别名(可选):
DataFrameDropna.register('df_dropna') # Write sheet['A12'].options('df_dropna', dropna=True).value = df # Read sheet['A1:C4'].options('df_dropna', dropna=True).value
把DataFrameDropna注册为DataFrames的缺省转换器(可选):
DataFrameDropna.register(pd.DataFrame) # Write sheet['A13'].options(dropna=True).value = df # Read sheet['A1:C4'].options(pd.DataFrame, dropna=True).value
上面示例同样适用于UDF,例如:
@xw.func
@arg('x', DataFrameDropna, dropna=True)
@ret(DataFrameDropna, dropna=True)
def myfunction(x):
# ...
return x
备注
Python对象被写入Excel的时候,会在转换流水线中经过多个不同的阶段。Excel/COM对象被读进Python的时候也同样如此。
Pipelines are internally defined by Accessor
classes. A Converter is just a special Accessor which
converts to/from a particular type by adding an extra stage to the pipeline of the default Accessor. For example, the
PandasDataFrameConverter
defines how a list of lists (as delivered by the default Accessor) should be turned
into a Pandas DataFrame.
Converter
(转换器)类提供了一套基础框架使得写新的转换器变得容易。如果需要控制更多的东西,可以直接编写 Accessor
的子类,不过这部分内容需要做更多的工作并且目前没有什么文档。