2016-03-15 132 views
0

假設我有DFPython和管拆分數據幀列

userid  subcategory    timestamp     smartexpenseid           companyid 
20648196 SmartExpense Declined 2016-03-06T16:44:55.702Z 11771712||91164585||||        9797 
43124398 SmartExpense Declined 2016-03-06T17:09:06.033Z 11111111|249178181?CARRT?266298850196|93461910|||| 63177 
76764125 SmartExpense Declined 2016-03-06T19:44:19.078Z 137177|250155900?HOTEL?270593373724|92826286||||  199412 

我想在同一個數據幀11111111 smartexpenseid列到單獨的列拆分大熊貓數據幀?|?249178181 CARRT 266298850196 | 93461910 |||| - >「CctKey | TripId?SegType?SegId | EreceiptId | PctKey | MeKey | RcKey | CapKey」

有人可以請建議一種最好的方式來做到這一點在Python?

回答

1

嘗試此

(?<CctKey>\d+)\|(?<TripId>\d*)\??(?<SegType>[^?]*)\??(?<SegId>\d*)\|(?<EreceiptId>\d+)\|(?<PctKey>[^|]*)\|(?<MeKey>[^|]*)\|(?<RcKey>[^|]*)\|(?<CapKey>[^|\n\s]*) 

Demo

在Python移除所有組?<name>語法

(\d+)\|(\d*)\??([^?]*)\??(\d*)\|(\d+)\|([^|]*)\|([^|]*)\|([^|]*)\|([^|\n\s]*) 
+0

正則表達式=「\( \ d +?)|(? \ d *)\ (? [^?] *)\?(? \ d *)\ |(? \ d +)\ |(? [^ |] *)\ |(? [^ |] *)\ |(? [^ |] *)\ |(? [^ | \ n \ s] *) df2 ['smartexpenseid']。我試圖添加它在Python中,它給了我一個錯誤,只聲明「語法錯誤」 –

+0

刪除所有組? (\ d +)\ |(\ d *)\ ??([^?] *)\ ??(\ d *)\ |(\ d +)\ |([^ |] *)\ |( [^ |] *)\ |([^ |] *)\ |([^ | \ n \ s] *)'https://regex101.com/r/fV5cM3/2 –

+0

非常感謝TIm007。這真的有幫助! –