格式化熊貓列的內容。刪除尾部文本和數字

我已經使用BeautifulSoup和pandas創建一個包含錯誤代碼和相應錯誤消息的列的csv。格式化熊貓列的內容。刪除尾部文本和數字

格式化之前，列看起來像這樣

-132456ErrorMessage 
-3254Some other Error 
-45466You've now used 3 different examples. 2 more to go. 
-10240 This time there was a space.  
-1232113That was a long number.

我已經成功地分離出這樣的代碼的文本：

dfDSError['text'] = dfDSError['text'].map(lambda x: x.lstrip('-'))

這將返回正是我想要的。

但我一直在努力想出一個解決方案的代碼。

我嘗試這樣做：

dfDSError['codes'] = dfDSError['codes'].replace(regex=True,to_replace= r'\D',value=r'')

但是，這會從錯誤信息追加數字編碼號結束。所以對於上面的第三個例子而不是45466，我會得到4546632.我也想保留前面的減號。

我想也許我可以以某種方式將rstrip（）與正則表達式組合在一起，以找到哪裏有一個非數字或空格旁邊的空格並刪除其他所有內容，但我一直不成功。

for_removal = re.compile(r'\d\D*') 
dfDSError['codes'] = dfDSError['codes'].map(lambda x: x.rstrip(re.findall(for_removal,x)))       
TypeError: rstrip arg must be None, unicode or str

有什麼建議嗎？謝謝！

來源

2017-02-19 Aaron Paul

您可以使用extract：

dfDSError[['code','text']] = dfDSError.text.str.extract('([-0-9]+)(.*)', expand=True) 
print (dfDSError) 
               text  code 
0          ErrorMessage -132456 
1         Some other Error  -3254 
2 You've now used 3 different examples. 2 more t... -45466 
3     This time there was a space.  -10240 
4       That was a long number. -1232113

來源

2017-02-19 19:18:33 jezrael

格式化熊貓列的內容。刪除尾部文本和數字

回答

相關問題