集索引排序特定列大熊貓

試圖在一個特定的格式編制該數據集索引排序特定列大熊貓

import pandas as pd 

voting = pd.read_json("GE2000.json") 
voting.set_index(['county_fips','candidate_name','pty','vote_pct'],inplace=True) 

print(voting)

這則返回

          vote 
county_fips candidate_name pty vote_pct 
2000  Howard Phillips CS 0   596 
      John Hagelin NL 0   919 
      Harry Browne LB 1   2636 
      George W. Bush R 59  167398 
      Al Gore   D 28   79004 
1001  Howard Phillips I 0    9 
      John Hagelin I 0    5 
      Harry Browne LB 0    51 
      George W. Bush R 70   11993 
      Al Gore   D 29   4942

在此之後

，我想整理vote_pct並抓住最大的，像這（我試過sort_values，sort_index等，並不能得到它產生所需的輸出）

          vote 
county_fips candidate_name pty vote_pct 
2000  George W. Bush R 59  167398 
1001  George W. Bush R 70   11993

這裏我S中的樣本數據

[ 

    { 
    "office" : "PRESIDENT", 
    "county_name" : "Alaska", 
    "vote_pct" : "0", 
    "county_fips" : "2000", 
    "pty" : "CS", 
    "candidate_name" : "Howard Phillips", 
    }, 
    { 
    "office" : "PRESIDENT", 
    "county_name" : "Alaska", 
    "vote_pct" : "0", 
    "county_fips" : "2000", 
    "pty" : "NL", 
    "candidate_name" : "John Hagelin", 
    } 
]

這些數據繼續

來源

2016-12-08 sn4ke

您能否提供原始數據的示例？ –

@ juanpa.arrivillaga更新，你 – sn4ke

你可以得到最大的，爲每個groupby和apply做前，然後再把設置索引。這允許您在列上使用groupby而不是在索引上（這很奇怪）：

voting = pd.read_json("GE2000.json") 

get_largest_vote_pct = lambda row: row[row.vote_pct == row.vote_pct.max()] 

largest = voting.groupby('county_fips').apply(get_largest_vote_pct) 

largest.set_index(['county_fips','candidate_name','pty','vote_pct'],inplace=True) 

print(largest) 

              vote 
county_fips candidate_name pty vote_pct   
1001  George W. Bush R 70   11993 
2000  George W. Bush R 59  167398

來源

2016-12-08 19:13:29 bunji

這是完美的，謝謝 – sn4ke

是比我的答案更好;） –

可以使用groupby例如voting.groupby('county_fips')['candidate_name'].max()。

還有更詳細的解答在這裏： Python : Getting the Row which has the max value in groups using groupby

來源

2016-12-08 19:05:23

集索引排序特定列大熊貓

回答

相關問題