資料結構教學

這篇教學將會介紹 xlwings 在讀取與寫入資料時預設的行爲,以及一些常見的應用。若想深入理解内容提到的 options 的使用方法,可以參考 Converters and Options

在執行以下範例程式碼之前,先確保有匯入 xlwings 套件:

>>> import xlwings as xw

單一儲存格

單一儲存格被讀取時,儲存格内的數字、文字、空值、或是日期這幾種資料,在預設上會被 xlwings 讀取時,會相對應的被轉換成 float, unicode, None 或是 datetime 等資料型別。

>>> import datetime as dt
>>> sheet = xw.Book().sheets[0]
>>> sheet['A1'].value = 1
>>> sheet['A1'].value
1.0
>>> sheet['A2'].value = 'Hello'
>>> sheet['A2'].value
'Hello'
>>> sheet['A3'].value is None
True
>>> sheet['A4'].value = dt.datetime(2000, 1, 1)
>>> sheet['A4'].value
datetime.datetime(2000, 1, 1, 0, 0)

串列

  • 一維串列:通常試算表上橫向與縱向的儲存格範圍(n x 1 或是 1 x n 大小的範圍),像是行與列,會以一個 Python 的一維串列表示,這樣也意味著該範圍資料方向的維度也會隨之消失。若想保留維度的訊息,請繼續閲讀下面的章節:

    >>> sheet = xw.Book().sheets[0]
    >>> sheet['A1'].value = [[1],[2],[3],[4],[5]]  # Column orientation (nested list)
    >>> sheet['A1:A5'].value
    [1.0, 2.0, 3.0, 4.0, 5.0]
    >>> sheet['A1'].value = [1, 2, 3, 4, 5]
    >>> sheet['A1:E1'].value
    [1.0, 2.0, 3.0, 4.0, 5.0]
    

    要强制將單一儲存格以串列的形式回傳,可以將 ndim 參數設定成 1:

    >>> sheet['A1'].options(ndim=1).value
    [1.0]
    

    備註

    To write a list in column orientation to Excel, use transpose: sheet.range('A1').options(transpose=True).value = [1,2,3,4]

  • 二維串列:若想保留單一行或列(n x 1 或是 1 x n 大小儲存格範圍)的維度資訊時,請將 range 方法的 ndim 參數設定成 2,這樣可以確保回傳的值會被封裝在一個巢狀串列内,保留了資料原本在工作表上的維度與方向資訊。

    >>> sheet['A1:A5'].options(ndim=2).value
    [[1.0], [2.0], [3.0], [4.0], [5.0]]
    >>> sheet['A1:E1'].options(ndim=2).value
    [[1.0, 2.0, 3.0, 4.0, 5.0]]
    
  • 二維的儲存格範圍都會以巢狀串列的形式回傳。當你將一個巢狀串列賦值/指定給一個儲存格範圍時。只需要指定被寫入儲存格範圍的左上角。另外,你也可以使用指定範圍起點與結束點的索引值來將將二維的資料讀取出來。

    >>> sheet['A10'].value = [['Foo 1', 'Foo 2', 'Foo 3'], [10, 20, 30]]
    >>> sheet.range((10,1),(11,3)).value
    [['Foo 1', 'Foo 2', 'Foo 3'], [10.0, 20.0, 30.0]]
    

備註

Try to minimize the number of interactions with Excel. It is always more efficient to do sheet.range('A1').value = [[1,2],[3,4]] than sheet.range('A1').value = [1, 2] and sheet.range('A2').value = [3, 4].

範圍擴展

你可以透過 expand() 方法,或是 options 方法的 expand 參數動態偵測並且讀取一個連續、有值的儲存格範圍的資料。這邊需要注意的是,expand() 方法會回傳一個 range() 物件,而 options(expand='table') 則會在選定一個範圍之後才會被執行,以下範例會解釋兩者的差異:

>>> sheet = xw.Book().sheets[0]
>>> sheet['A1'].value = [[1,2], [3,4]]
>>> range1 = sheet['A1'].expand('table')  # or just .expand()
>>> range2 = sheet['A1'].options(expand='table')
>>> range1.value
[[1.0, 2.0], [3.0, 4.0]]
>>> range2.value
[[1.0, 2.0], [3.0, 4.0]]
>>> sheet['A3'].value = [5, 6]
>>> range1.value
[[1.0, 2.0], [3.0, 4.0]]
>>> range2.value
[[1.0, 2.0], [3.0, 4.0], [5.0, 6.0]]

'table' 會同時從起點開始,往下並且往右擴展,簡單來説,就是搜尋一個二維範圍的右下角。其他選項,像是 down,則會從指定的起點開始往下擴展,將所有連續的、有值的儲存格資料讀取出來。

備註

Using expand() together with a named Range as top left cell gives you a flexible setup in Excel: You can move around the table and change its size without having to adjust your code, e.g. by using something like sheet.range('NamedRange').expand().value.

NumPy arrays

NumPy 陣列的運作方式與巢狀串列類似,不同的地方在於空的儲存格是用 nan 而非 None 表示。若你希望將一個範圍的值以 Numpy 陣列的形式回傳,請在 options 方法内設定 convert=np.array

>>> import numpy as np
>>> sheet = xw.Book().sheets[0]
>>> sheet['A1'].value = np.eye(3)
>>> sheet['A1'].options(np.array, expand='table').value
array([[ 1.,  0.,  0.],
       [ 0.,  1.,  0.],
       [ 0.,  0.,  1.]])

Pandas DataFrames

>>> sheet = xw.Book().sheets[0]
>>> df = pd.DataFrame([[1.1, 2.2], [3.3, None]], columns=['one', 'two'])
>>> df
   one  two
0  1.1  2.2
1  3.3  NaN
>>> sheet['A1'].value = df
>>> sheet['A1:C3'].options(pd.DataFrame).value
   one  two
0  1.1  2.2
1  3.3  NaN
# options: work for reading and writing
>>> sheet['A5'].options(index=False).value = df
>>> sheet['A9'].options(index=False, header=False).value = df

Pandas Series

>>> import pandas as pd
>>> import numpy as np
>>> sheet = xw.Book().sheets[0]
>>> s = pd.Series([1.1, 3.3, 5., np.nan, 6., 8.], name='myseries')
>>> s
0    1.1
1    3.3
2    5.0
3    NaN
4    6.0
5    8.0
Name: myseries, dtype: float64
>>> sheet['A1'].value = s
>>> sheet['A1:B7'].options(pd.Series).value
0    1.1
1    3.3
2    5.0
3    NaN
4    6.0
5    8.0
Name: myseries, dtype: float64

備註

You only need to specify the top left cell when writing a list, a NumPy array or a Pandas DataFrame to Excel, e.g.: sheet['A1'].value = np.eye(10)

Chunking: Read/Write big DataFrames etc.

When you read and write from or to big ranges, you may have to chunk them or you will hit a timeout or a memory error. The ideal chunksize will depend on your system and size of the array, so you will have to try out a few different chunksizes to find one that works well:

import pandas as pd
import numpy as np
sheet = xw.Book().sheets[0]
data = np.arange(75_000 * 20).reshape(75_000, 20)
df = pd.DataFrame(data=data)
sheet['A1'].options(chunksize=10_000).value = df

And the same for reading:

# As DataFrame
df = sheet['A1'].expand().options(pd.DataFrame, chunksize=10_000).value
# As list of list
df = sheet['A1'].expand().options(chunksize=10_000).value