2017-09-12 57 views
1

我有我從一個數據幀創建一個矩陣,我要刪除所有列在這裏它的每一個值是0。Python的 - 在矩陣刪除行和列,其中所有的值是0

我見過的例子使用dropna df2.loc[:, (df2 != 0).any(axis=0)]但它不會對我的數據框做任何事情。

這是我建立了我的矩陣:

a = ['Psychology','Education','Social policy','Sociology','Pol. sci. & internat. studies','Development studies','Social anthropology','Area Studies','Science and Technology Studies','Law & legal studies','Economics','Management & business studies','Human Geography','Environmental planning','Demography','Social work','Tools, technologies & methods','Linguistics','History'] 
final_df = new_df[new_df['Subject'].isin(a)] 

ctrs = {location: Counter(gp.GrantRefNumber) for location, gp in final_df.groupby('Subject')} 

ctrs = list(ctrs.items()) 
overlaps = [(loc1, loc2, sum(min(ctr1[k], ctr2[k]) for k in ctr1)) 
    for i, (loc1, ctr1) in enumerate(ctrs, start=1) 
    for (loc2, ctr2) in ctrs[i:] if loc1 != loc2] 
overlaps += [(l2, l1, c) for l1, l2, c in overlaps] 

df22 = pd.DataFrame(overlaps, columns=['Loc1', 'Loc2', 'Count']) 
df22 = df22.set_index(['Loc1', 'Loc2']) 
df22 = df22.unstack().fillna(0).astype(int) 

#the end part of the next line filters the top 'x' amount. 
b = np.sort(np.unique(df22.values.ravel()))[-20:] 
df2 = df22.where(df22.isin(b),0.0) 

有趣的(或沒有),當我輸入df2.columns,我得到:

MultiIndex(levels=[[u'Count'], [u'Area Studies', u'Demography', u'Development studies', u'Economics', u'Education', u'Environmental planning', u'History', u'Human Geography', u'Law & legal studies', u'Linguistics', u'Management & business studies', u'Pol. sci. & internat. studies', u'Psychology', u'Science and Technology Studies', u'Social anthropology', u'Social policy', u'Social work', u'Sociology', u'Tools, technologies & methods']], 
      labels=[[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18]], 
      names=[None, u'Loc2']) 

這可能是爲什麼我在掙扎。

回答

2

您需要all爲包含0~爲倒置狀態欄True

df = pd.DataFrame({'B':[0,0,0,0,0,0], 
        'C':[0,8,9,4,2,3], 
        'D':[0,3,5,7,1,0], 
        'E':[0,3,6,9,2,4]}) 

print (df) 
    B C D E 
0 0 0 0 0 
1 0 8 3 3 
2 0 9 5 6 
3 0 4 7 9 
4 0 2 1 2 
5 0 3 0 4 

df = df.loc[~df.eq(0).all(axis=1), ~df.eq(0).all()] 
print (df) 
    C D E 
1 8 3 3 
2 9 5 6 
3 4 7 9 
4 2 1 2 
5 3 0 4 
+0

嘿Jezrael,只是都嘗試你的例子,它從數據框中刪除一切,剛剛離開「LOC1和列表(即刪除所有的數字和列標題?我有一些列中的數字,所以它不應該刪除所有這些 – ScoutEU

+1

我正在閱讀要求作爲刪除所有值爲0的列 - 這將刪除那些其中任何值爲0 ... –

+0

咋,我的意思是所有的值都在0列:) – ScoutEU