我試圖找到一個簡單的方法,打破了以下數據框:副本排在熊貓數據幀
COL_A COL_B COL_C COL_D
VAL1 VAL2 VAL3 OFFER1|OFFER2|OFFER3
到
COL_A COL_B COL_C COL_D COL_Y
VAL1 VAL2 VAL3 ... OFFER1
VAL1 VAL2 VAL3 ... OFFER2
VAL1 VAL2 VAL3 ... OFFER3
我試圖找到一個簡單的方法,打破了以下數據框:副本排在熊貓數據幀
COL_A COL_B COL_C COL_D
VAL1 VAL2 VAL3 OFFER1|OFFER2|OFFER3
到
COL_A COL_B COL_C COL_D COL_Y
VAL1 VAL2 VAL3 ... OFFER1
VAL1 VAL2 VAL3 ... OFFER2
VAL1 VAL2 VAL3 ... OFFER3
讓我們用pd.concat
,str.split
和fillna
:
pd.concat([df,df.COL_D.str.split('|',expand=True).T],axis=1).rename(columns={0:'COL_Y'}).fillna(method='ffill')
輸出:
COL_A COL_B COL_C COL_D COL_Y
0 VAL1 VAL2 VAL3 OFFER1|OFFER2|OFFER3 OFFER1
1 VAL1 VAL2 VAL3 OFFER1|OFFER2|OFFER3 OFFER2
2 VAL1 VAL2 VAL3 OFFER1|OFFER2|OFFER3 OFFER3
這看起來很有前途,但我很獲得520,000行左右的MemoryError。內存密集? –
並不是極端的,你的行可能因數據而爆炸。 –
我甚至把它縮減到:索引80 RO_NUMBER 4104184 VIN 4104184 優惠4104184 dtype:int64和我仍然得到memoryError與大量的內存 –
希望鏈接幫助https://stackoverflow.com/questions/35166359/how-to-unnest-cells-in-a-dataframe-employing-pandas-and-python – Wen