我對熊貓來說是比較新鮮的東西，我有大約500,000行的熊貓數據框填充了數字。我使用Python 2.x，目前正在定義和調用下面顯示的方法。如果序列'A'中的兩個相鄰值相同，它將預測值設置爲與序列'B'中的對應值相等。但是，它運行速度非常慢，每秒輸出大約5行，我想要找到一種更快完成相同結果的方法。如何加快我的大熊貓數據框上的迭代功能？

def myModel(df): 

    A_series = df['A'] 
    B_series = df['B'] 
    seriesLength = A_series.size 

    # Make a new empty column in the dataframe to hold the predicted values 
    df['predicted_series'] = np.nan 

    # Make a new empty column to store whether or not 
    # prediction matches predicted matches B 
    df['wrong_prediction'] = np.nan 
    prev_B = B_series[0] 
    for x in range(1, seriesLength): 

     prev_A = A_series[x-1] 
     prev_B = B_series[x-1] 
     #set the predicted value to equal B if A has two equal values in a row 
     if A_series[x] == prev_A: 
      if df['predicted_series'][x] > 0: 
       df['predicted_series'][x] = df[predicted_series'][x-1] 
      else: 
       df['predicted_series'][x] = B_series[x-1]

有沒有一種方法可以將矢量化或使其運行速度更快？在目前情況下，預計需要數小時。它真的需要這麼長時間嗎？似乎沒有500,000行應該給我的程序那麼多問題。

來源

2016-05-13 Richard

您是否正在使用Python 3.x？如果沒有，至少嘗試從'range'切換到'xrange'。 – quapka

我正在使用python 2.x.我會編輯我的問題來包含它。 – Richard

是這個文本或數值數據？ – Grr

像這樣的東西像你描述應該工作：

df['predicted_series'] = np.where(A_series.shift() == A_series, B_series, df['predicted_series'])

來源

2016-05-13 16:56:05 ayhan

感謝您的支持。你知道是否有一種方法可以將predictive_series的值添加到條件中？此外，這運行速度更快嗎？ – Richard

比循環快得多？最可能。比Grr的答案快多了？我不確定。對於100個隨機行，它比.loc快3倍，但您應該在更大的數據框上嘗試。您也可以爲預測系列提供條件。條件究竟是什麼？ – ayhan

謝謝。我會嘗試兩種。條件是僅當A和前A相等時，predict_series才被設置爲一個值。如果之前的預測系列是NaN，則預測系列等於df.B。否則，如果之前的預測係數是某個數字> 0，那麼predict_series =之前的預測系列 – Richard

df.loc[df.A.diff() == 0, 'predicted_series'] = df.B

這將擺脫對循環並設置predicted_series到B的值時，A等於先前A.

編輯：

根據您的評論，更改您的預測系列的初始化爲全部NAN，然後前填充值S：

df['predicted_series'] = np.nan 
df.loc[df.A.diff() == 0, 'predicted_series'] = df.B 
df.predicted_series = df.predicted_series.fillna(method='ffill')

對於最快的速度修改ayhans回答位將表現最佳：

df['predicted_series'] = np.where(df.A.shift() == df.A, df.B, df['predicted_series'].shift())

這會給你向前填充的值和運行速度比我原來的建議

來源

2016-05-13 16:56:13 Grr

非常感謝！然而，預測系列的價值是否也有條件？因爲在某些情況下，我希望預測值等於先前的預測值，並且如果先前的預測值爲nan值，則只有'B'，因爲有很多缺失值。 – Richard

所以讓我知道，如果我有這個權利。當A和先前的A相等時，您希望predict_series的值等於df.B，否則您希望predict_series的值等於之前的predict_series值？ – Grr

並不完全，當A和前A相等時，predict_series僅設置爲一個值。如果之前的預測系列是NaN，則預測系列等於df.B。否則，如果之前的預測系列是某個數字> 0，則預測系列=以前的預測系列。 – Richard

解決方案

df.loc[df.A == df.A.shift()] = df.B.shift()

來源

2016-05-13 17:07:37 piRSquared

如何加快我的大熊貓數據框上的迭代功能？

回答

解決方案

相關問題