2017-03-24 102 views
2

我有一個熊貓數據框,我想在其中更新基於數據框中另一列的列的值。我之前使用以下代碼進行更新:熊貓:更高效的方式來更新熊貓數據框中的一列沒有for循環

for i1, col1 in dfMod.iterrows(): 
if col1['day'] == "MONDAY": 
    dfMod.ix[i1,'weekIndex'] = 1 
elif col1['day'] == "TUESDAY": 
    dfMod.ix[i1,'weekIndex'] = 2 
elif col1['day'] == "WEDNESDAY": 
    dfMod.ix[i1,'weekIndex'] = 3 
elif col1['day'] == "THURSDAY": 
    dfMod.ix[i1,'weekIndex'] = 4 
elif col1['day'] == "FRIDAY": 
    dfMod.ix[i1,'weekIndex'] = 5 
elif col1['day'] == "SATURDAY": 
    dfMod.ix[i1,'weekIndex'] = 6 
else: 
    dfMod.ix[i1,'weekIndex'] = 7 

但是,數據幀有300,000行,需要永久編譯。有沒有更好的方法來更新列?

+1

看那系列'map'方法。 – BrenBarn

+0

我剛剛也問過這個。我的問題可能對你有用:http://stackoverflow.com/questions/42972081/updating-columns-in-dataframe-using-a-series – sdasdadas

回答

3

您需要map通過dict

d = {"MONDAY": 1, "TUESDAY":2, "WEDNESDAY":3, "THURSDAY":4, 
    "FRIDAY":5, "SATURDAY":6, "SUNDAY":7} 

dfMod["weekIndex"] = dfMod["day"].map(d) 

樣品:

dfMod = pd.DataFrame({'day':['TUESDAY','THURSDAY','FRIDAY','SATURDAY','MONDAY','SUNDAY']}) 

d = {"MONDAY": 1, "TUESDAY":2, "WEDNESDAY":3, 
    "THURSDAY":4, "FRIDAY":5, "SATURDAY":6, "SUNDAY":7} 

dfMod["weekIndex"] = dfMod["day"].map(d) 
print (dfMod) 
     day weekIndex 
0 TUESDAY   2 
1 THURSDAY   4 
2 FRIDAY   5 
3 SATURDAY   6 
4 MONDAY   1 
5 SUNDAY   7 

時序在300k - map更快6 timesapply解決方案:

dfMod = pd.DataFrame({'day':['TUESDAY','THURSDAY','FRIDAY','SATURDAY','MONDAY','SUNDAY']}) 
#300k rows 
dfMod = pd.concat([dfMod]*50000).reset_index(drop=True) 

d = {"MONDAY": 1, "TUESDAY":2, "WEDNESDAY":3, "THURSDAY":4, 
    "FRIDAY":5, "SATURDAY":6, "SUNDAY":7} 

In [92]: %timeit dfMod["weekIndex"] = dfMod["day"].map(d) 
10 loops, best of 3: 22.7 ms per loop 

In [93]: %timeit dfMod["weekIndex1"] = dfMod["day"].apply(lambda x: d[x]) 
10 loops, best of 3: 141 ms per loop 
+0

非常感謝,完美的工作,正如你所說的方式比應用更快 – ayush

+0

我已經在300k行測試你的原始解決方案 - '1循環,最好的3:每個循環21分鐘47s' – jezrael

1

嘗試apply方法:

daysOfWeek = {"MONDAY": 1, "TUESDAY":2, "WEDNESDAY":3, "THURSDAY":4, "FRIDAY":5, "SATURDAY":6, "SUNDAY":7} 

dfMod["weekIndex"] = dfMod["day"].apply(lambda x: daysOfWeek[x]) 
+0

這工作,謝謝! – ayush

1

請用@ jezrael的答案,因爲它是地道的。
這純粹是爲了演示,並試圖提供有關可能使用的其他熊貓工具的有用信息。

設置
使用@ jezrael的給定的例子

dfMod = pd.DataFrame({'day':['TUESDAY','THURSDAY','FRIDAY','SATURDAY','MONDAY','SUNDAY']}) 

d = {"MONDAY": 1, "TUESDAY":2, "WEDNESDAY":3, 
    "THURSDAY":4, "FRIDAY":5, "SATURDAY":6, "SUNDAY":7} 

備用溶液

dfMod.join(pd.Series(d, name='weekIndex'), on='day') 

     day weekIndex 
0 TUESDAY   2 
1 THURSDAY   4 
2 FRIDAY   5 
3 SATURDAY   6 
4 MONDAY   1 
5 SUNDAY   7