如何改善以下代碼在Python的性能

我有以下形式的列數據集：如何改善以下代碼在Python的性能

'< 1 year' 
'10+ years' 
'6 years' 
etc

我需要它轉換爲整數格式，也就是'< 1年 - > 0，'10 +年' - > 10和'6年' - > 6等條目。有500,000條目。我寫了下面的腳本來清除它：

temp = data.X11 
for i in range(len(temp)): 
    if ~is_nan(temp[i]): 
     if isinstance(temp[i], six.string_types): 
      b= temp[i].split(" ") 
      if len(b) == 3 and (b[0])=='<': 
       temp[i] = 0 
      elif len(b) == 2: 
       if b[0] == '10+': 
        temp[i] = 10 
       else: 
        temp[i] = int(b[0]) 
     else: 
      if isinstance(temp[i], float): 
       temp[i] = math.floor(temp[i]) 
      if isinstance(temp[i], int): 
       if temp[i] >= 10: 
        temp[i] = 10 
       elif temp[i] < 1 and temp[i] >= 0: 
        temp[i] = 0 
       elif temp[i] < 0: 
        temp[i] = -10 
       else: 
        pass 


    else: 
     temp[i] = -10

它的工作原理。但缺點是，它非常緩慢（花費數小時才能完成）。我的問題是如何提高此代碼的性能。

任何意見或幫助代碼片段將不勝感激。

感謝

來源

2015-08-31 user62198

我不確定你可以在這裏做很多事情。您可以嘗試通過迭代臨時值來避免temp[i]訪問。您也可以將新值附加到另一個列表的末尾（快速），而不是在中間修改值（不是那麼快）。

new_temp = list() 
for temp_i in data.X11: 
    if ~is_nan(temp_i): 
     if isinstance(temp_i, six.string_types): 
      b = temp_i.split(" ") 
      if len(b) == 3 and (b[0])=='<': 
       new_temp.append(0) 
      elif len(b) == 2: 
       if b[0] == '10+': 
        new_temp.append(10) 
       else: 
        new_temp.append(int(b[0])) 
     else: 
      if isinstance(temp_i, float): 
       new_temp.append(math.floor(temp_i)) 
      if isinstance(temp_i, int): 
       if temp_i >= 10: 
        new_temp.append(10) 
       elif temp_i < 1 and temp_i >= 0: 
        new_temp.append(0) 
       elif temp_i < 0: 
        new_temp.append(-10) 
    else: 
     new_temp.append(-10)

string.split很可能會很慢。

如果可能，您也可以嘗試使用pypy來執行您的代碼，或者將其重寫爲與cython兼容。

來源

2015-08-31 13:23:14

感謝您的意見@QuentinRoy – user62198

我會試試看。字典解決方案（見下文）也需要很長時間。 – user62198

隨着熊貓
您可以創建一個dictionnary，然後用它映射你的數據幀

dico = {'< 1 year' :1,'10+ years' :10,'6 years' :6 } 
df['New_var'] = df.var1.map(dico)

應該只需要幾秒鐘

來源

2015-08-31 13:21:03 steboc

非常感謝您的意見。但我不確定代碼第二行中的「var1」。你能簡單解釋一下嗎？再次感謝 – user62198

沒關係，我想我明白了。謝謝。 – user62198

我現在正在嘗試，但不幸的是需要很長時間。 – user62198

我認爲罪魁禍首是這個line：

math.floor（temp [i]）

它返回一個浮點數，它使用比標準整數多幾位。將該操作的結果轉換爲整數可以提高性能。

另一種解決方案是升級到Python 3.x.x，因爲在這些版本中，floor和ceil所有返回整數。

來源

2015-08-31 13:22:26 DrewB

感謝您的意見@DrewB – user62198

如何改善以下代碼在Python的性能

回答

相關問題