2017-02-21 98 views
1

嗨〜我正在處理我的數據。熊貓條件語句問題

我想用條件語句提取數據

這是我的代碼。

# -*- coding: utf-8 -*- 
import pandas as pd 
import numpy as np 
import os 

join_file = r'D:\handling data\complete data\조인\after_join.csv' 
pwd = os.getcwd() 
os.chdir(os.path.dirname(join_file)) 
join_data = pd.read_csv(os.path.basename(join_file), sep=',', encoding='utf-8') 

print(join_data.head()) 

enter image description here

join_data['cluster_z'] = 4 # 둘다 하락세   
join_data['cluster_z'][((join_data['cluster_x'] == 3 | join_data['cluster_x'] == 2 | join_data['cluster_x'] == 4) 
        & (join_data['cluster_y'] == 3 | join_data['cluster_y'] == 1))] = 1 # 다 상승세 

join_data['cluster_z'][((join_data['cluster_x'] == 1 | join_data['cluster_x'] == 5) 
        & (join_data['cluster_y'] == 3 | join_data['cluster_y'] == 1))] = 2 # 전체 하락세, 점포당 상승세 

join_data['cluster_z'][((join_data['cluster_x'] == 3 | join_data['cluster_x'] == 2 | join_data['cluster_x'] == 4) 
        & (join_data['cluster_y'] == 2 | join_data['cluster_y'] == 4))] = 3 # 전체 상승세, 점파당 하락세 

print(join_data.head()) 

和執行第二打印後(join_data.head())。 我喜歡的圖片

enter image description here

我怎樣才能解決這個問題的錯誤? 提前致謝。

回答

2

看來你省去了很多括號的條件之間,也能更好的是使用loc

原文:

join_data['cluster_z'] 
[((join_data['cluster_x'] == 3 | 
    join_data['cluster_x'] == 2 | 
    join_data['cluster_x'] == 4) & 
    (join_data['cluster_y'] == 3 | 
    join_data['cluster_y'] == 1))] = 1 

更改爲:

join_data.loc[ 
((join_data['cluster_x'] == 3) | 
(join_data['cluster_x'] == 2) | 
(join_data['cluster_x'] == 4)) & 
((join_data['cluster_y'] == 3) | 
(join_data['cluster_y'] == 1)), 'cluster_z'] = 1 

或者更好地利用isin

join_data.loc[ 
(join_data['cluster_x'].isin([3,2,4])) & 
(join_data['cluster_y'].isin([3,1])), 'cluster_z'] = 1 

一起:

join_data = pd.DataFrame({'cluster_x':[3,2,5,3], 
         'cluster_y':[3,0,1,2]}) 

print (join_data) 
    cluster_x cluster_y 
0   3   3 
1   2   0 
2   5   1 
3   3   2 

join_data['cluster_z'] = 4 

join_data.loc[ 
(join_data['cluster_x'].isin([3,2,4])) & 
(join_data['cluster_y'].isin([3,1])), 'cluster_z'] = 1 

join_data.loc[ 
(join_data['cluster_x'].isin([1,5])) & 
(join_data['cluster_y'].isin([3,1])), 'cluster_z'] = 2 

join_data.loc[ 
(join_data['cluster_x'].isin([3,2,4])) & 
(join_data['cluster_y'].isin([2,4])), 'cluster_z'] = 3 

print (join_data) 
    cluster_x cluster_y cluster_z 
0   3   3   1 
1   2   0   4 
2   5   1   2 
3   3   2   3 

或者更可讀:

mask1 = join_data['cluster_x'].isin([3,2,4]) 
mask2 = join_data['cluster_y'].isin([3,1]) 
mask3 = join_data['cluster_x'].isin([1,5]) 
mask4 = join_data['cluster_y'].isin([2,4]) 

join_data['cluster_z'] = 4 
join_data.loc[mask1 & mask2 , 'cluster_z'] = 1 
join_data.loc[mask3 & mask2 , 'cluster_z'] = 2 
join_data.loc[mask1 & mask4 , 'cluster_z'] = 3 

print (join_data) 
    cluster_x cluster_y cluster_z 
0   3   3   1 
1   2   0   4 
2   5   1   2 
3   3   2   3 

解決方案與多個numpy.where

mask1 = join_data['cluster_x'].isin([3,2,4]) 
mask2 = join_data['cluster_y'].isin([3,1]) 
mask3 = join_data['cluster_x'].isin([1,5]) 
mask4 = join_data['cluster_y'].isin([2,4]) 

join_data['cluster_z'] = np.where(mask1 & mask2, 1, 
         np.where(mask3 & mask2, 2, 
         np.where(mask1 & mask4, 3, 4)))   

print (join_data) 
    cluster_x cluster_y cluster_z 
0   3   3   1 
1   2   0   4 
2   5   1   2 
3   3   2   3 
+0

謝謝~~你這麼大的傢伙! 有很多方法來處理它。哈哈。 你怎麼知道很多方法。謝謝~~ 有一個美好的一天~~ –