我有一個pandas.DataFrame
,有幾列,其中一些具有連續數據,另一些具有分類。我一直試圖按類別先組合,然後在每個類別內根據條件(即兩個數字之間的值)拆分爲數組。根據條件對數據框的行進行排序,並根據其他條件將數據塊分割爲數組
這是我寫的一個蠻力hackjob,完成這項工作,但我想知道如果有更優雅的方式。
import pandas as pd
df = pd.DataFrame({'Category1' : [ 0.3, 3.0, 12.4, 7.4,
20.3, 15.0, 10.9, 17.4],
'Category2' : [ 0, 0, 1, 0,
1, 1, 0, 0],
'Category3' : [ 1, 2, 3, 4,
5, 6, 7, 8],
'Category4' : ['foo','bar','fizz','buzz',
'spam','nii','blah','lol'],
etc., })
group_0_5 = df['Category1']<=5.0
group_5_10 = (df['Category1']>5.0) & (df['Category1']<=10.0)
group_10_15 = (df['Category1']>10.0) & (df['Category1']<=15.0)
group_15_20 = (df['Category1']>15.0) & df['Category1']<=20.0)
group_20_25 = (df['Category1']>20.0) & (df['Category1']<=25.0)
state1 = (df['Category2']==1)
state2 = (df['Category2']==0)
count1_state1 = df.loc[group_0_5 & state1]['Category3'].count()
count2_state1 = df.loc[group_5_10 & state1]['Category3'].count()
count3_state1 = df.loc[group_10_15 & state1]['Category3'].count()
count4_state1 = df.loc[group_15_20 & state1]['Category3'].count()
count5_state1 = df.loc[group_20_25 & state1]['Category3'].count()
count1_state2 = df.loc[group_0_5 & state2]['Category3'].count()
count2_state2 = df.loc[group_5_10 & state2]['Category3'].count()
count3_state2 = df.loc[group_10_15 & state2]['Category3'].count()
count4_state2 = df.loc[group_15_20 & state2]['Category3'].count()
count5_state2 = df.loc[group_20_25 & state2]['Category3'].count()
count_array1=[count1_state1, count2_state1, count3_state1, count4_state1, count5_state1]
count_array2=[count1_state2, count2_state2, count3_state2, count4_state2, count5_state2]
print (count_array1)
print (count_array2)
Out [2]:
[nan, nan, 2, 1, 1]
[ 2, 1, 1, 1, nan]
謝謝!切割方法正是我期待的細化這個代碼。 –
很高興能爲您提供幫助。如果我的回答很有幫助,請不要忘記[接受](http://meta.stackexchange.com/a/5235/295067)它。謝謝。 – jezrael