2016-09-23 56 views
0

我有一個如下所示的輸入文件,它需要按照鍵值需要升序的順序排列,而不存在的鍵需要打印在最後。 我正在以所需格式獲取數據,但訂單已丟失。使用熊貓排序和排列列表

我嘗試過使用sort()方法,但它顯示「list has no attribute sort」。 請建議解決方案,並建議是否需要修改。

輸入文件:

3=1388|4=1388|5=IBM|8=157.75|9=88929|1021=1500|854=n|388=157.75|394=157.75|474=157.75|1584=88929|444=20160713|459=93000546718000|461=7|55=93000552181000|22=89020|400=157.75|361=0.73|981=0|16=1468416600.6006|18=1468416600.6006|362=0.46 
3=1388|4=1388|5=IBM|8=157.73|9=100|1021=0|854=p|394=157.73|474=157.749977558|1584=89029|444=20160713|459=93001362639104|461=26142|55=93001362849000|22=89120|361=0.71|981=0|16=1468416601.372|18=1468416601.372|362=0.45 
3=1388|4=1388|5=IBM|8=157.69|9=100|1021=600|854=p|394=157.69|474=157.749910415|1584=89129|444=20160713|459=93004178882560|461=27052|55=93004179085000|22=89328|361=0.67|981=1|16=1468416604.1916|18=1468416604.1916|362=0.43 

代碼我想:

import pandas as pd 
import numpy as np 
df = pd.read_csv('inputfile', index_col=None, names=['text']) 
s = df.text.str.split('|') 
ds = [dict(w.split('=', 1) for w in x) for x in s] 
p = pd.DataFrame.from_records(ds) 
p1 = p.replace(np.nan,'n/a', regex=True) 
st = p1.stack(level=0,dropna=False) 
dfs = [g for i,g in st.groupby(level=0)] 
#print st 
i = 0 
while i < len(dfs):  
    #index of each column 
    print ('\nindex[%d]'%i) 
    for (_,k),v in dfs[i].iteritems(): 
     print k,'\t',v 
    i = i + 1 

輸出獲得:

index[0] 
1021 1500 
1584 88929 
16 1468416600.6006 
18 1468416600.6006 
22 89020 
3  1388 
361 0.73 
362 0.46 
388 157.75 
394 157.75 
4  1388 
400 157.75 
444 20160713 
459 93000546718000 
461 7 
474 157.75 
5  IBM 
55 93000552181000 
8  157.75 
854 n 
9  88929 
981 0 

index[1] 
1021 0 
1584 89029 
16 1468416601.372 
18 1468416601.372 
22 89120 
3  1388 
361 0.71 
362 0.45 
388 n/a 
394 157.73 
4  1388 
400 n/a 
444 20160713 
459 93001362639104 
461 26142 
474 157.749977558 
5  IBM 
55 93001362849000 
8  157.73 
854 p 
9  100 
981 0 

預期輸出:

index[0] 
3  1388 
4  1388 
5  IBM 
8  157.75 
9  88929 
16 1468416600.6006 
18 1468416600.6006 
22 89020 
55 93000552181000 
361 0.73 
362 0.46 
388 157.75 
394 157.75 
400 157.75 
444 20160713 
459 93000546718000 
461 7 
474 157.75 
854 n 
981 0 
1021 1500 
1584 88929 

index[1] 
3  1388 
4  1388 
5  IBM 
8  157.75 
9  88929 
16 1468416600.6006 
18 1468416600.6006 
22 89020 
55 93000552181000 
361 0.73 
362 0.46 
394 157.75 
444 20160713 
459 93000546718000 
461 7 
474 157.75 
854 n 
981 0 
1021 1500 
1584 88929 
388 n/a 
400 n/a 
+0

它看起來像你輸入string和分選。如果您在存儲之前將標記號碼轉換爲整數,您應該得到預期的排序輸出。 – danio

+0

我想你應該回答[這個問題](http://stackoverflow.com/q/39648855/3765319) – Kartik

回答

0

替換您的DS符合

ds = [{int(pair[0]): pair[1] for pair in [w.split('=', 1) for w in x]} for x in s] 

到索引轉換爲整數,因此將數字順序排序

要輸出的N /在末端具有值,您可以使用熊貓選項首先輸出非空值,然後輸出空值,例如:

for (ix, series) in p.iterrows(): 
    print('\nindex[%d]' % ix) 
    output_series(ix, series[pd.notnull]) 
    output_series(ix, series[pd.isnull].fillna('n/a')) 

順便說一句,你還可以簡化您的堆棧,GROUPBY,打印到:

for (ix, series) in p1.iterrows(): 
    print('\nindex[%d]' % ix) 
    for tag, value in series.iteritems(): 
     print(tag, '\t', value) 

所以整個腳本變爲:

def output_series(ix, series): 
    for tag, value in series.iteritems(): 
     print(tag, '\t', value) 

df = pd.read_csv('inputfile', index_col=None, names=['text']) 
s = df.text.str.split('|') 
ds = [{int(pair[0]): pair[1] for pair in [w.split('=', 1) for w in x]} for x in s] 
p = pd.DataFrame.from_records(ds) 
for (ix, series) in p.iterrows(): 
    print('\nindex[%d]' % ix) 
    output_series(ix, series[pd.notnull]) 
    output_series(ix, series[pd.isnull].fillna('n/a')) 
0

這裏:

import pandas as pd 
import numpy as np 

df = pd.read_csv('inputfile', index_col=None, names=['text']) 
s = df.text.str.split('|') 
ds = [dict(w.split('=', 1) for w in x) for x in s] 
p1 = pd.DataFrame.from_records(ds).fillna('n/a') 
st = p1.stack(level=0,dropna=False) 
for k, v in st.groupby(level=0): 
    print(k, v.sort_index())