2016-12-08 47 views
1

試圖在一個特定的格式編制該數據集索引排序特定列大熊貓

import pandas as pd 

voting = pd.read_json("GE2000.json") 
voting.set_index(['county_fips','candidate_name','pty','vote_pct'],inplace=True) 

print(voting) 

這則返回

          vote 
county_fips candidate_name pty vote_pct 
2000  Howard Phillips CS 0   596 
      John Hagelin NL 0   919 
      Harry Browne LB 1   2636 
      George W. Bush R 59  167398 
      Al Gore   D 28   79004 
1001  Howard Phillips I 0    9 
      John Hagelin I 0    5 
      Harry Browne LB 0    51 
      George W. Bush R 70   11993 
      Al Gore   D 29   4942 
在此之後

,我想整理vote_pct並抓住最大的,像這(我試過sort_values,sort_index等,並不能得到它產生所需的輸出)

          vote 
county_fips candidate_name pty vote_pct 
2000  George W. Bush R 59  167398 
1001  George W. Bush R 70   11993 

這裏我S中的樣本數據

[ 

    { 
    "office" : "PRESIDENT", 
    "county_name" : "Alaska", 
    "vote_pct" : "0", 
    "county_fips" : "2000", 
    "pty" : "CS", 
    "candidate_name" : "Howard Phillips", 
    }, 
    { 
    "office" : "PRESIDENT", 
    "county_name" : "Alaska", 
    "vote_pct" : "0", 
    "county_fips" : "2000", 
    "pty" : "NL", 
    "candidate_name" : "John Hagelin", 
    } 
] 

這些數據繼續

+0

您能否提供原始數據的示例? –

+0

@ juanpa.arrivillaga更新,你 – sn4ke

回答

2

你可以得到最大的,爲每個groupbyapply做​​前,然後再把設置索引。這允許您在列上使用groupby而不是在索引上(這很奇怪):

voting = pd.read_json("GE2000.json") 

get_largest_vote_pct = lambda row: row[row.vote_pct == row.vote_pct.max()] 

largest = voting.groupby('county_fips').apply(get_largest_vote_pct) 

largest.set_index(['county_fips','candidate_name','pty','vote_pct'],inplace=True) 

print(largest) 

              vote 
county_fips candidate_name pty vote_pct   
1001  George W. Bush R 70   11993 
2000  George W. Bush R 59  167398 
+0

這是完美的,謝謝 – sn4ke

+0

是比我的答案更好;) –