應用功能，數據幀列

我有一個熊貓數據幀：應用功能，數據幀列

name sample 
1 a  Category 1: qwe, asd (line break) Category 2: sdf, erg 
2 b  Category 2: sdf, erg(line break) Category 5: zxc, eru 
... 
30 p  Category 1: asd, Category PE: 2134, EFDgh, Pdr tke, err

我想結束：

name qwe asd sdf erg zxc eru 2134 EFDgh Pdr tke err 
1 a  1  1  1  1 0  0 0  0  0  0 
2 b  0  0  1  1 1  1 0  0  0  0 
... 
30 p  0 1  0  0 0  0 0  1  1  0

我創建了以下功能：

def cleanattributes(istring): 

    istring=str(istring) 
    istring=istring.rstrip().split('\\n') 

    counter=0 
    for line in istring: 
     istring[counter]=istring[counter].rpartition(': ')[-1] 
     counter+=1 
    istring=str(istring) 
    istring = istring.replace("'", "") 
    istring = istring.replace("\"", "") 
    return(str(istring))

這個函數創建返回沒有類別標題的類別信息的預期結果（想法是使用getdummies來獲取合作伙伴） lumns）

teststring="Category 1: qwe, asd\\nCategory 2: sdf, erg" 
cleanattributes(teststring) 
OUTPUT: '[qwe, asd, sdf, erg]'

我不知道如何最好地應用此功能，每一個記錄，使數據幀是這樣的：

name sample 
1 a  qwe, asd, sdf, erg 
2 b  sdf, erg, zxc, eru 
... 
30 p  asd, 2134, EFDgh, Pdr tke, err

或者，如果這是甚至逼近這個的最好方法。

按照要求：

df['sample'].iat[0] 
OUTPUt= 'Category 1: qwe, asd\nCategory 2: sdf, erg'

來源

2016-04-05 M Arroyo

什麼是'DF [ '樣品']的EXACT輸出IAT [0]'。？ – Alexander

輸出結果爲'Category 1：qwe，asd \ nCategory 2：sdf，erg'（編輯：刪除了一個額外的\ n我爲測試目的而意外添加的） –

df = pd.DataFrame(
    {'name': ['a', 'b'], 
    'sample': ['Category 1: asd, Category PE: 2134, EFDgh, Pdr tke, err', 
       'Category 2: sdf, erg\nCategory 5: zxc, eru\nCategory 1: asd, Category PE: 2134, EFDgh, Pdr tke, err']} 

df2 = pd.concat([df.name, 
       df['sample'] 
       .str.replace("(Category .*:)+", '') # Remove "Category [*]:" 
       .str.replace(r'\n', '') # Remove "\n" 
       .str.split(', ', expand=True)], 
       axis=1) 

df3 = pd.melt(df2, id_vars='name')[['name', 'value']] 

>>> pd.concat([df3['name'], pd.get_dummies(df3['value'])], axis=1) 
    name 2134 EFDgh Pdr tke ergzxc err eru2134 sdf 
0  a  1  0  0  0 0  0 0 
1  b  0  0  0  0 0  0 1 
2  a  0  1  0  0 0  0 0 
3  b  0  0  0  1 0  0 0 
4  a  0  0  1  0 0  0 0 
5  b  0  0  0  0 0  1 0 
6  a  0  0  0  0 1  0 0 
7  b  0  1  0  0 0  0 0 
8  a  0  0  0  0 0  0 0 
9  b  0  0  1  0 0  0 0 
10 a  0  0  0  0 0  0 0 
11 b  0  0  0  0 1  0 0

來源

2016-04-05 21:31:47 Alexander

應用功能，數據幀列

回答

相關問題