2016-11-07 74 views
5

我有一個字符串列表,它看起來像這樣:Python的大熊貓轉換逗號分隔值的列表,數據幀

["Name: Alice, Department: HR, Salary: 60000", "Name: Bob, Department: Engineering, Salary: 45000"] 

我想這個列表轉換成數據幀,看起來像這樣:

Name | Department | Salary 
-------------------------- 
Alice | HR | 60000 

Bob | Engineering | 45000 

最簡單的方法是什麼? 我的直覺說丟數據到CSV,並與正則表達式單獨標題「^ *:」,但必須有一個更簡單的方法

+0

這是非常簡單的。所以,在我們給你答案之前,你做了什麼來自己找到答案? *提示:*這是一個以逗號分隔的k => v對的字符串數組(由':'分隔) – Fallenreaper

回答

8

隨着一些字符串處理就可以得到類型的字典列表,並傳遞到數據幀的構造函數:

lst = ["Name: Alice, Department: HR, Salary: 60000", 
     "Name: Bob, Department: Engineering, Salary: 45000"] 
pd.DataFrame([dict([kv.split(': ') for kv in record.split(', ')]) for record in lst]) 
Out: 
    Department Name Salary 
0   HR Alice 60000 
1 Engineering Bob 45000 
3

你能做到這樣:

In [271]: s 
Out[271]: 
['Name: Alice, Department: HR, Salary: 60000', 
'Name: Bob, Department: Engineering, Salary: 45000'] 

In [272]: pd.read_csv(io.StringIO(re.sub(r'\s*(Name|Department|Salary):\s*', r'', '~'.join(s))), 
    ...:    names=['Name','Department','Salary'], 
    ...:    header=None, 
    ...:    lineterminator=r'~' 
    ...:) 
    ...: 
Out[272]: 
    Name Department Salary 
0 Alice   HR 60000 
1 Bob Engineering 45000 
3

有點創意

s.str.extractall(r'(?P<key>[^,]+)\s*:(?P<value>[^,]+)') \ 
    .reset_index('match', drop=True) \ 
    .set_index('key', append=True).value.unstack() 

enter image description here

設置

l = ["Name: Alice, Department: HR, Salary: 60000", 
    "Name: Bob, Department: Engineering, Salary: 45000"] 
s = pd.Series(l)