Python的熊貓：如何分割排序的字典的數據幀的一列

我有這樣一個數據幀：Python的熊貓：如何分割排序的字典的數據幀的一列

id asn  orgs 
0 3320 {'Deutsche Telekom AG': 2288} 
1 47886 {'Joyent': 16, 'Equinix (Netherlands) B.V.': 7} 
2 47601 {'fusion services': 1024, 'GCE Global Maritime':16859} 
3 33438 {'Highwinds Network Group': 893}

我想排序的「單位部門」列實際上是一本字典，然後提取在兩個不同的列中得到具有最高值的對（k，v）。像這樣：

id asn  org      value 
0 3320 'Deutsche Telekom AG'  2288 
1 47886 'Joyent'     16 
2 47601 'GCE Global Maritime'  16859 
3 33438 'Highwinds Network Group' 893

目前我正在運行此代碼，但它沒有正確排序，然後我不知道如何提取具有最高值的對。

df.orgs.apply(lambda x : sorted(x.items(),key=operator.itemgetter(1),reverse=True))

這給了我這樣一個列表：

id asn  orgs 
0 3320 [('Deutsche Telekom AG', 2288)] 
1 47886 [('Joyent', 16),('Equinix (Netherlands) B.V.', 7)] 
2 47601 [('GCE Global Maritime',16859),('fusion services', 1024)] 
3 33438 [('Highwinds Network Group', 893)]

現在怎麼可以把鑰匙和最高價值爲兩個單獨的列？任何人都可以幫忙嗎？

來源

2015-04-20 UserYmY

那麼你要求的只是最大值，排序有點不相關不？ – EdChum

@EdChum否，因爲我希望將密鑰和值分別放在最大值對的單獨列中。 – UserYmY

另一種方法定義，只是呼籲字典min的功能，並返回一個系列，所以你可以分配到多個列（從@Alex Martelli's answer採取的函數體）：

In [17]: 

def func(x): 
    k = min(x, key=x.get) 
    return pd.Series([k, x[k]]) 
df[['orgs', 'value']] = df['orgs'].apply(func) 
df 

Out[17]: 
    asn id      orgs value 
0 3320 0   Deutsche Telekom AG 2288 
1 47886 1 Equinix (Netherlands) B.V.  7 
2 47601 2    fusion services 1024 
3 33438 3  Highwinds Network Group 893

編輯

如果你的數據有空dicss，T你可以只測試len：

In [34]: 

df = pd.DataFrame({'id':[0,1,2,3,4], 
        'asn':[3320,47886,47601,33438,56], 
        'orgs':[{'Deutsche Telekom AG': 2288}, 
          {'Joyent': 16, 'Equinix (Netherlands) B.V.': 7}, 
          {'fusion services': 1024, 'GCE Global Maritime':16859}, 
          {'Highwinds Network Group': 893},{}]}) 
df 
Out[34]: 
    asn id            orgs 
0 3320 0      {'Deutsche Telekom AG': 2288} 
1 47886 1 {'Equinix (Netherlands) B.V.': 7, 'Joyent': 16} 
2 47601 2 {'GCE Global Maritime': 16859, 'fusion service... 
3 33438 3     {'Highwinds Network Group': 893} 
4  56 4             {} 
In [36]: 

def func(x): 
    if len(x) > 0: 
     k = min(x, key=x.get) 
     return pd.Series([k, x[k]]) 
    return pd.Series([np.NaN, np.NaN]) 

df[['orgs', 'value']] = df['orgs'].apply(func) 
df 

Out[36]: 
    asn id      orgs value 
0 3320 0   Deutsche Telekom AG 2288 
1 47886 1 Equinix (Netherlands) B.V.  7 
2 47601 2    fusion services 1024 
3 33438 3  Highwinds Network Group 893 
4  56 4       NaN NaN

來源

2015-04-20 09:12:09 EdChum

謝謝EdChum。我得到這個錯誤：ValueError：min（）arg是一個空序列，我猜測，因爲我也有一些空單元格。我如何修改這個異常？ – UserYmY

你可以測試這個值是否爲空或者試一試，我會更新我的答案 – EdChum

它是空的還是'NaN'？ – EdChum

這應該工作：

In [1]: import pandas as pd 
In [2]: import operator 
In [3]: df = pd.DataFrame({ 'id' : [0,1,2,3], 
    ...:      'asn' : [3320, 47886, 47601, 33438], 
    ...:      'orgs' : [{'Deutsche Telekom AG': 2288}, {'Joyent': 16, 'Equinix (Netherlands) B.V.': 7}, {'fusion services': 1024, 'GCE Global Maritime':16859}, {'Highwinds Network Group': 893}] 
    ...:     }) 

In [4]: df.orgs, df['value'] = zip(*df.orgs.apply(lambda x : sorted(x.items(),key=operator.itemgetter(1),reverse=True)[0])) 

In [5]: df 
Out[5]: 
    asn id      orgs value 
0 3320 0  Deutsche Telekom AG 2288 
1 47886 1     Joyent  16 
2 47601 2  GCE Global Maritime 16859 
3 33438 3 Highwinds Network Group 893

我用zip(* <first element of sorted dict items>)，並將它們分配給df.orgs和df.value。

空字典：

In [3]: df = pd.DataFrame({ 'id' : [0,1,2,3], 
    ...:      'asn' : [3320, 47886, 47601, 33438], 
    ...:      'orgs' : [{'Deutsche Telekom AG': 2288}, {'Joyent': 16, 'Equinix (Netherlands) B.V.': 7}, {'fusion services': 1024, 'GCE Global Maritime':16859}, {}] 
    ...:     }) 
In [4]: df.orgs.apply(lambda x : sorted(x.items(),key=operator.itemgetter(1),reverse=True)[0] if len(x) else ('','')) 
Out[4]: 
0  (Deutsche Telekom AG, 2288) 
1     (Joyent, 16) 
2 (GCE Global Maritime, 16859) 
3       (,) 
Name: orgs, dtype: object 

In [5]: df.orgs, df['value'] = zip(*df.orgs.apply(lambda x : sorted(x.items(),key=operator.itemgetter(1),reverse=True)[0] if len(x) else ('',''))) 

In [6]: df 
Out[6]: 
    asn id     orgs value 
0 3320 0 Deutsche Telekom AG 2288 
1 47886 1    Joyent  16 
2 47601 2 GCE Global Maritime 16859 
3 33438 3

來源

2015-04-20 09:07:34 DTing

我在這裏有同樣的問題，我應該如何處理空字典的組織？ – UserYmY

Python的熊貓：如何分割排序的字典的數據幀的一列

回答

相關問題