熊貓的子運營商做什麼？

這是直接來自教程，即使在閱讀文檔後我也無法理解。熊貓的子運營商做什麼？

In [14]: df = DataFrame({'one' : Series(randn(3), index=['a', 'b', 'c']), 
    ....:     'two' : Series(randn(4), index=['a', 'b', 'c', 'd']), 
    ....:     'three' : Series(randn(3), index=['b', 'c', 'd'])}) 
    ....: 

In [15]: df 
Out[15]: 
     one  three  two 
a -0.626544  NaN -0.351587 
b -0.138894 -0.177289 1.136249 
c 0.011617 0.462215 -0.448789 
d  NaN 1.124472 -1.101558 

In [16]: row = df.ix[1] 

In [17]: column = df['two'] 

In [18]: df.sub(row, axis='columns') 
Out[18]: 
     one  three  two 
a -0.487650  NaN -1.487837 
b 0.000000 0.000000 0.000000 
c 0.150512 0.639504 -1.585038 
d  NaN 1.301762 -2.237808

爲什麼第二行變成0？它是否爲sub - 用0代替？

此外，當我使用row = df.ix[0]時，整個第二列變爲NaN。爲什麼？

來源

2015-05-09 Heisenberg

旁白：'.iloc'現在首選當你想指的是由位置，而不是指數的行或列;它具有更簡單的語義。 – DSM

不清楚爲什麼這已被downvoted，似乎完全成形的問題給我 – EdChum

事實上，我打開了一個問題，所以文檔字符串將得到改善：https://github.com/pydata/pandas/issues/10093 – joris

sub意味着減，所以讓我們通過這一走：

In [44]: 
# create some data 
df = pd.DataFrame({'one' : pd.Series(np.random.randn(3), index=['a', 'b', 'c']), 
        'two' : pd.Series(np.random.randn(4), index=['a', 'b', 'c', 'd']), 
        'three' : pd.Series(np.random.randn(3), index=['b', 'c', 'd'])}) 
df 
Out[44]: 
     one  three  two 
a -1.536737  NaN 1.537104 
b 1.486947 -0.429089 -0.227643 
c 0.219609 -0.178037 -1.118345 
d  NaN 1.254126 -0.380208 
In [45]: 
# take a copy of 2nd row 
row = df.ix[1] 
row 
Out[45]: 
one  1.486947 
three -0.429089 
two  -0.227643 
Name: b, dtype: float64 
In [46]: 
# now subtract the 2nd row row-wise 
df.sub(row, axis='columns') 
Out[46]: 
     one  three  two 
a -3.023684  NaN 1.764747 
b 0.000000 0.000000 0.000000 
c -1.267338 0.251052 -0.890702 
d  NaN 1.683215 -0.152565

所以大概什麼是混淆你是正在發生的事情，當你指定「列」作爲操作上的軸。我們從每一行中減去第二行的值，這就解釋了爲什麼第二行現在變成了全0。你傳遞的數據是一系列的，我們正在調心柱的因此，實際上我們對對齊的列名這就是爲什麼它進行逐行

In [47]: 
# now take a copy of the first row 
row = df.ix[0] 
row 
Out[47]: 
one  -1.536737 
three   NaN 
two  1.537104 
Name: a, dtype: float64 
In [48]: 
# perform the same op 
df.sub(row, axis='columns') 
Out[48]: 
     one three  two 
a 0.000000 NaN 0.000000 
b 3.023684 NaN -1.764747 
c 1.756346 NaN -2.655449 
d  NaN NaN -1.917312

那麼，爲什麼我們現在有一個列與所有NaN值？這是因爲當你用NaN執行任何操作功能，那麼結果是NaN

In [55]: 

print(1 + np.NaN) 
print(1 * np.NaN) 
print(1/np.NaN) 
print(1 - np.NaN) 
nan 
nan 
nan 
nan

來源

2015-05-09 19:40:15 EdChum

對'axis'選項的很好的解釋。 'row'的索引是'one two three'，我們將這個索引與'df'的列索引匹配。 – Heisenberg

是的，這是因爲我們已經通過一系列'In [57]：類型（行）輸出[57]： pandas.core.series.Series'所以我們現在正在指示函數什麼索引在這種情況下，它是列，所以對齊是針對列而不是索引值執行的 – EdChum

這樣做是從列的所有值中減去第二行中的每個值。也就是說，它取得位置("b", "one")的值，並從列「one」中的所有值中減去它;它取得位置("b", "two")的值，並從列「2」中的所有值中減去它;它取值爲("b", "three")，並從列「3」中的所有值中減去它。因此，例如，("c", "one")中的結果是0.011617 - (-0.138894) = 0.150512。行「b」中的所有值都是零，因爲那是你正在減去的行，所以在那一行你從它自己減去它，給零。

至於你的問題的第二部分，如果你選擇第一行，它包含一個NaN。因此，減法從第二列中的所有值中減去NaN，這也將它們全部變成NaN（因爲任何減NaN都是NaN）。

來源

2015-05-09 19:39:55 BrenBarn

瞭解「sub」是指減法使其立即清晰。 – Heisenberg

@Heisenberg：在你得到你的例子的頁面上，它說「DataFrame有方法add，sub，mul，div」。如果這並不能立即告訴你「sub」意味着「減去」，我不知道該說什麼。 – BrenBarn

熊貓的子運營商做什麼？

回答

相關問題