2017-04-15 195 views
1

有人可以向我解釋爲什麼這兩個語句(for循環和理解)返回兩個不同的答案。我認爲他們是一樣的,只是不同的寫作方式。python for循環和理解循環

使用的數據:

Top152['% Renewable'] 
Country 
China     19.754910 
United States   11.570980 
Japan     10.232820 
United Kingdom  10.600470 
Russian Federation 17.288680 
Canada    61.945430 
Germany    17.901530 
India     14.969080 
France    17.020280 
South Korea   2.279353 
Italy     33.667230 
Spain     37.968590 
Iran     5.707721 
Australia    11.810810 
Brazil    69.648030 

For循環:

def answer_ten(): 
    Top15 = answer_one() 
    Top152 = Top15.copy() 

    for x in Top152['% Renewable']: 
     if x >= Top152['% Renewable'].median(): 
      Top152['HighRenew'] = 1 
     else: 
      Top152['HighRenew'] = 0 
return Top152['HighRenew'] 
    answer_ten() 

輸出:

Country 
    China     1 
    United States   1 
    Japan     1 
    United Kingdom  1 
    Russian Federation 1 
    Canada    1 
    Germany    1 
    India     1 
    France    1 
    South Korea   1 
    Italy     1 
    Spain     1 
    Iran     1 
    Australia    1 
    Brazil    1  

理解:

def answer_ten(): 
Top15 = answer_one() 
Top152 = Top15.copy() 

    Top152['HighRenew'] = [1 if x >= Top152['% Renewable'].median() else 0 for x in Top152['% Renewable']] 


return Top152['HighRenew'] 
answer_ten() 

輸出:

Country 
China     1 
United States   0 
Japan     0 
United Kingdom  0 
Russian Federation 1 
Canada    1 
Germany    1 
India     0 
France    1 
South Korea   0 
Italy     1 
Spain     1 
Iran     0 
Australia    0 
Brazil    1 

回答

0

更好的是轉換boolean maskint,因爲pandas的用最快的工作非常快的矢量功能:

print (Top152['% Renewable']> Top152['% Renewable'].median()) 
China     True 
United States   False 
Japan     False 
United Kingdom  False 
Russian Federation  True 
Canada     True 
Germany    True 
India     False 
France    False 
South Korea   False 
Italy     True 
Spain     True 
Iran     False 
Australia    False 
Brazil     True 
Name: % Renewable, dtype: bool 

def answer_ten(): 
    return (Top152['% Renewable'] > Top152['% Renewable'].median()) 
      .astype(int).rename('HighRenew') 


print (answer_ten()) 
China     1 
United States   0 
Japan     0 
United Kingdom  0 
Russian Federation 1 
Canada    1 
Germany    1 
India     0 
France    0 
South Korea   0 
Italy     1 
Spain     1 
Iran     0 
Australia    0 
Brazil    1 
Name: HighRenew, dtype: int32 

For循環,很慢的解決方案是可能的使用iterrows,但速度更快是第一解決方案:

def answer_ten(): 
    for idx, x in Top152.iterrows(): 
     if Top152.loc[idx, '% Renewable'] >= Top152['% Renewable'].median(): 
      Top152.loc[idx, 'HighRenew'] = 1 
     else: 
      Top152.loc[idx, 'HighRenew'] = 0 
    return Top152['HighRenew'].astype(int) 

print (answer_ten()) 
China     1 
United States   0 
Japan     0 
United Kingdom  0 
Russian Federation 1 
Canada    1 
Germany    1 
India     0 
France    1 
South Korea   0 
Italy     1 
Spain     1 
Iran     0 
Australia    0 
Brazil    1 
Name: HighRenew, dtype: int32 

個時序

#[15000 rows x 1 columns] 
Top152 = pd.concat([Top152]*1000).reset_index(drop=True) 

def answer_ten1(): 
    return (Top152['% Renewable']> Top152['% Renewable'].median()).astype(int).rename('HighRenew') 

def answer_ten2(): 
    for idx, x in Top152.iterrows(): 
     if Top152.loc[idx, '% Renewable'] >= Top152['% Renewable'].median(): 
      Top152.loc[idx, 'HighRenew'] = 1 
     else: 
      Top152.loc[idx, 'HighRenew'] = 0 
    return Top152['HighRenew'].astype(int) 


def answer_ten3(): 
    Top152['HighRenew'] = [1 if x >= Top152['% Renewable'].median() else 0 for x in Top152['% Renewable']] 
    return Top152['HighRenew'] 

print (answer_ten1()) 
print (answer_ten2()) 
print (answer_ten3()) 

In [169]: %timeit (answer_ten1()) 
1000 loops, best of 3: 528 µs per loop 

In [170]: %timeit answer_ten2() 
1 loop, best of 3: 16 s per loop 

In [171]: %timeit (answer_ten3()) 
1 loop, best of 3: 2.67 s per loop 
0

你在每個迭代步驟設置整個列(矢量):

Top152['HighRenew'] = 1 

嘗試,而不是這種矢量方法:

Top152['HighRenew'] = (Top152['% Renewable'] >= Top152['% Renewable'].median()).astype(int) 

所以你的功能可實施如下:

def answer_ten(): 
    return (Top15['% Renewable'] >= Top15['% Renewable'].median()).astype(int) 
0

在您正在編輯的矢量第二種方法。雖然for循環會保存它(在後臺),以避免不必要的編輯!