爲什麼在函數超過熊貓DataFrame之後，這個函數不會「取」？

我有一個帶有時間戳的溫度和風速值的DataFrame，以及將這些轉換爲「風寒」的功能。我使用iterrows在每一行上運行函數，並希望通過漂亮的「Wind Chill」列獲取DataFrame。爲什麼在函數超過熊貓DataFrame之後，這個函數不會「取」？

然而，雖然它似乎正在發揮作用，並且實際上至少「工作過」了一次，但似乎無法持續複製它。總的來說，我覺得這是我對DataFrames的結構缺少的東西，但我希望有人能夠提供幫助。

In [28]: bigdf.head() 
Out[28]: 


          Day Temperature Wind Speed Year 
2003-03-01 06:00:00-05:00 1 30.27  5.27  2003 
2003-03-01 07:00:00-05:00 1 30.21  4.83  2003 
2003-03-01 08:00:00-05:00 1 31.81  6.09  2003 
2003-03-01 09:00:00-05:00 1 34.04  6.61  2003 
2003-03-01 10:00:00-05:00 1 35.31  6.97  2003

所以我一個「風寒」列添加到bigdf和預填充有NaN。

In [29]: bigdf['Wind Chill'] = NaN

然後我嘗試遍歷行，添加實際的風寒。

In [30]: for row_index, row in bigdf[:5].iterrows(): 
    ...:  row['Wind Chill'] = windchill(row['Temperature'], row['Wind Speed']) 
    ...:  print row['Wind Chill'] 
    ...: 
24.7945889994 
25.1365267133 
25.934114012 
28.2194307516 
29.5051046953

正如你可以說，出現新的值被應用到了「風寒」列。這裏的windchill功能，以防萬一，可以幫助：

def windchill(temp, wind): 
    if temp>50 or wind<=3: 
     return temp 
    else: 
     return 35.74 + 0.6215*temp - 35.75*wind**0.16 + 0.4275*temp*wind**0.16

但是，當我在數據幀尋找一遍，NaN的仍然存在：

In [31]: bigdf.head() 
Out[31]: 

          Day Temperature Wind Speed Year Wind Chill 
2003-03-01 06:00:00-05:00 1 30.27  5.27  2003 NaN 
2003-03-01 07:00:00-05:00 1 30.21  4.83  2003 NaN 
2003-03-01 08:00:00-05:00 1 31.81  6.09  2003 NaN 
2003-03-01 09:00:00-05:00 1 34.04  6.61  2003 NaN 
2003-03-01 10:00:00-05:00 1 35.31  6.97  2003 NaN

什麼甚至離奇的是，它有工作過一次或兩次，我不能說我做了什麼不同。

我必須承認我對熊貓的內部工作並不是特別熟悉，並且對索引等有困惑，所以我覺得我可能在這裏錯過了一些非常基本的東西（或者這麼做很難）。

謝謝！

來源

2013-04-12 wimsy

您可以使用apply做到這一點：

In [11]: df.apply(lambda row: windchill(row['Temperature'], row['Wind Speed']), 
       axis=1) 
Out[11]: 
2003-03-01 06:00:00-05:00 24.794589 
2003-03-01 07:00:00-05:00 25.136527 
2003-03-01 08:00:00-05:00 25.934114 
2003-03-01 09:00:00-05:00 28.219431 
2003-03-01 10:00:00-05:00 29.505105 

In [12]: df['Wind Chill'] = df.apply(lambda row: windchill(row['Temperature'], row['Wind Speed']), 
            axis=1) 

In [13]: df 
Out[13]: 
          Day Temperature Wind Speed Year Wind Chill 
2003-03-01 06:00:00-05:00 1  30.27  5.27 2003 24.794589 
2003-03-01 07:00:00-05:00 1  30.21  4.83 2003 25.136527 
2003-03-01 08:00:00-05:00 1  31.81  6.09 2003 25.934114 
2003-03-01 09:00:00-05:00 1  34.04  6.61 2003 28.219431 
2003-03-01 10:00:00-05:00 1  35.31  6.97 2003 29.505105

。

要對你的困惑的原因擴大，我認爲它來源於這樣的事實，該行的變量是DF的copies rather than views，這樣的改變不傳播：

In [21]: for _, row in df.iterrows(): row['Day'] = 2

我們看到它正在改變成功地複製，則row變量（S）：

In [22]: row Out[22]: Day 2.00 Temperature 35.31 Wind Speed 6.97 Year 2003.00 Name: 2003-03-01 10:00:00-05:00

卜他們不更新到數據幀：

In [23]: df Out[23]: Day Temperature Wind Speed Year 2003-03-01 06:00:00-05:00 1 30.27 5.27 2003 2003-03-01 07:00:00-05:00 1 30.21 4.83 2003 2003-03-01 08:00:00-05:00 1 31.81 6.09 2003 2003-03-01 09:00:00-05:00 1 34.04 6.61 2003 2003-03-01 10:00:00-05:00 1 35.31 6.97 2003

下也留下df不變：

In [24]: row = df.ix[0] # also a copy In [25]: row['Day'] = 2

而如果我們不採取視圖：（我們將看到一個變化df）

In [26]: row = df.ix[2:3] # this one's a view In [27]: row['Day'] = 3

請參閱Returning a view versus a copy (in the docs)。

來源

2013-04-12 13:24:57

我懷疑它與副本vs視圖有關，但我想的是相反的方式，真的讓我感到困惑。感謝您的詳細解答！ – wimsy

我有一個類似的問題，有類似的解決方案，但這裏有一個奇怪的部分：它以某種方式在較舊的安裝上工作，但與其他機器上的較新版本的Pandas無關。這真的讓我瘋狂。因此，如果其他人開始把他們的頭髮拉出類似的問題，我想我會通過這個 – ViennaMike

@維也納邁克你說上面的工作更新或更老的熊貓？熊貓的應用中有幾個邊緣情況已經在前幾次發佈中進行了調整，因此這可能是其中之一！ –

與嘗試：

bigdf['Wind Chill'] = bigdf.apply(lambda x: windchill(x['Temperature'], x['Wind Speed']), axis=1)

在一次使用簡單windchill功能整個數據幀。

來源

2013-04-12 13:23:46 eumiro

我會說你不需要任何明確的循環。幸運地是，你想要做什麼

bigdf = pd.DataFrame({'Temperature': [30.27, 30.21, 31.81], 'Wind Speed': [5.27, 4.83, 6.09]}) 

def windchill(temp, wind): 
    "compute the wind chill given two pandas series temp and wind" 
    tomodify = (temp<=50) & (wind>3) #check which values need to be modified 
    t = temp.copy() #create a new series 
    # change only the values that need modification 
    t[tomodify] = 35.74 + 0.6215*temp[tomodify] - 35.75*wind[tomodify]**0.16 + 
     0.4275*temp[tomodify]*wind[tomodify]**0.16 
    return t 

bigdf['Wind Chill'] = windchill(bigdf['Temperature'], bigdf['Wind Speed']) 

bigdf 

    Temperature Wind Speed Wind Chill 
0  30.27  5.27 24.794589 
1  30.21  4.83 25.136527 
2  31.81  6.09 25.934114

PS：這個實施windchill作品也與numpy的陣列。

來源

2013-04-12 13:24:02

謝謝。我的谷歌搜索顯示，重新加工windchill是另一種選擇，但我真的試圖弄清楚我做錯了事情的方式。 :) – wimsy

Gotcha。很好，你找到了解釋 –

我有一個類似的問題，有一個類似的解決方案，但這是一個奇怪的部分：它工作在一個較舊的安裝不知何故，但與其他機器上的較新版本的熊貓。這真的讓我瘋狂。所以如果其他人開始在類似的問題上拉扯自己的頭髮，我想我會通過這一點。 – ViennaMike

爲什麼在函數超過熊貓DataFrame之後，這個函數不會「取」？

回答

相關問題