2016-03-12 71 views
0

我正在通過LaTeX將關鍵字及其相應的頁碼寫入文本文件,然後使用Python進行處理。如何使用相應的關鍵字創建一個排序的頁碼列表?從DataFrame創建唯一編號的排序列表

下面的代碼給了我唯一的列表,但它沒有排序。

import pandas as pd 

def unique(liste): 
    a = liste.split(',') 
    a = [int(numeric_string) for numeric_string in a] 
    a = sorted(a) 
    a = map(str,a) 
    b = set(a) 
    return ','.join(b) 

df = pd.DataFrame({'keyword': ["foo","foo","foo","foo","foo","foo","foo","foo","bar","bar","bar"], "page": [1,2,3,3,4,5,6,7,7,9,10]}) 
df['page'] = df['page'].astype(str) 
print(df) 

grouped = df.groupby('keyword',as_index=False).agg(lambda col: ','.join(col)) 
grouped = pd.DataFrame(grouped) 
grouped['unique'] = grouped['page'].apply(unique) 
print(grouped) 

產生

keyword page 
0  foo 1 
1  foo 2 
2  foo 3 
3  foo 3 
4  foo 4 
5  foo 5 
6  foo 6 
7  foo 7 
8  bar 7 
9  bar 9 
10  bar 10 
    keyword    page   unique 
0  bar   7,9,10   9,7,10 
1  foo 1,2,3,3,4,5,6,7 3,7,6,4,5,2,1 
+0

你想要的輸出是什麼? – Alexander

回答

1
import numpy as np 
import pandas as pd 

df = pd.DataFrame(
    {'keyword': ["foo","foo","foo","foo","foo","foo","foo","foo","bar","bar","bar"], 
    "page": [1,2,3,3,4,5,6,7,7,9,10]}) 

# df['page'] = df['page'].astype(int) 
result = df.groupby(['keyword'])['page'].agg(lambda x: ','.join(np.unique(x).astype(str))) 

print(result) 

產量

keyword 
bar   7,9,10 
foo 1,2,3,4,5,6,7 
Name: page, dtype: object 

  • np.unique返回一個唯一的排序值的數組。我們希望頁面值按整數(而不是字符串)排序,因此請將page值保存爲整數。在撥打np.unique後,您可以使用astype(str)轉換爲字符串,然後將其與','.join加入。