2016-04-20 62 views
1

我有一個csv文件中的學生列表。我希望(使用Python)顯示四列,我想要顯示在數學,計算機和物理中具有較高分數的男生。如何結合兩個以上的列?

我試圖使用pandas庫。

marks = pd.concat([data['name'], 
       data.loc[data['students']==1, 'maths'].nlargest(n=10)], 'computer'].nlargest(n=10)], 'physics'].nlargest(n=10)]) 

我使用1男學生和0女學生。 它給我一個錯誤說:無效的語法。

+0

也許你應該分手你的問題。在連接之前,首先嚐試將每個結果分配給一個數據幀。此外,它有助於提供樣本數據[問] – Alexander

回答

1

這是一種顯示每個學科中前10名學生的方法。當然,如果您想要組合而不是個人表現,您可以將三個分數相加並選擇總分最高的學生(請參見下圖)。

df1 = pd.DataFrame(data={'name': [''.join(random.choice('abcdefgh') for _ in range(8)) for i in range(100)], 
         'students': np.random.randint(0, 2, size=100)}) 
df2 = pd.DataFrame(data=np.random.randint(0, 10, size=(100, 3)), columns=['math', 'physics', 'computers']) 
data = pd.concat([df1, df2], axis=1) 

data.info() 

RangeIndex: 100 entries, 0 to 99 
Data columns (total 5 columns): 
name   100 non-null object 
students  100 non-null int64 
math   100 non-null int64 
physics  100 non-null int64 
computers 100 non-null int64 
dtypes: int64(4), object(1) 
memory usage: 4.0+ KB 

res = pd.concat([data.loc[:, ['name']], data.loc[data['students'] == 1, 'math'].nlargest(n=10), data.loc[data['students'] == 1, 'physics'].nlargest(n=10), data.loc[data['students'] == 1, 'computers'].nlargest(n=10)], axis=1) 

res.dropna(how='all', subset=['math', 'physics', 'computers']) 

     name math physics computers 
0 geghhbce NaN  9.0  NaN 
1 hbbdhcef NaN  7.0  NaN 
4 ghgffgga NaN  NaN  8.0 
6 hfcaccgg 8.0  NaN  NaN 
14 feechdec NaN  NaN  8.0 
15 dfaabcgh 9.0  NaN  NaN 
16 ghbchgdg 9.0  NaN  NaN 
23 fbeggcha NaN  NaN  9.0 
27 agechbcf 8.0  NaN  NaN 
28 bcddedeg NaN  NaN  9.0 
30 hcdgbgdg NaN  8.0  NaN 
38 fgdfeefd NaN  NaN  9.0 
39 fbcgbeda 9.0  NaN  NaN 
41 agbdaegg 8.0  NaN  9.0 
49 adgbefgg NaN  8.0  NaN 
50 dehdhhhh NaN  NaN  9.0 
55 ccbaaagc NaN  8.0  NaN 
68 hhggfffe 8.0  9.0  NaN 
71 bhggbheg NaN  9.0  NaN 
84 aabcefhf NaN  NaN  9.0 
85 feeeefbd 9.0  NaN  NaN 
86 hgeecacc NaN  8.0  NaN 
88 ggedgfeg 9.0  8.0  NaN 
89 faafgbfe 9.0  NaN  9.0 
94 degegegd NaN  8.0  NaN 
99 beadccdb NaN  NaN  9.0 


data['total'] = data.loc[:, ['math', 'physics', 'computers']].sum(axis=1) 
data[data.students==1].nlargest(10, 'total').sort_values('total', ascending=False) 

     name students math physics computers total 
29 fahddafg   1  8  8   8  24 
79 acchhcdb   1  8  9   7  24 
9 ecacceff   1  7  9   7  23 
16 dccefaeb   1  9  9   4  22 
92 dhaechfb   1  4  9   9  22 
47 eefbfeef   1  8  8   5  21 
60 bbfaaada   1  4  7   9  20 
82 fbbbehbf   1  9  3   8  20 
18 dhhfgcbb   1  8  8   3  19 
1 ehfdhegg   1  5  7   6  18 
+0

謝謝Stefan。您的代碼非常有用。我試過了,並且在最後兩行發生錯誤:data ['total'] = data.loc [:, ['math','physics','computers']]。sum(axis = 1) data [data.students == 1] .nlargest(10,'total')....我需要以降序顯示高分的學生。謝謝。 –

+0

查看更新=你可以'.sort_values(升序= False)'指示要排序的列。 http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.sort_values.html你得到了哪個錯誤? – Stefan

+0

它給出了這個錯誤:'DataFrame'對象沒有'sort_values'屬性! –