我有一個數據幀data
與2列ID
和Text
。目標是根據日期將Text
列中的值分成多列。通常情況下,日期會啓動一系列需要在列中的字符串值,除非日期位於字符串的末尾(在這種情況下,它被視爲以前一個日期開始的字符串的一部分)。如何使用日期來分割一個數據幀列python中的多列
data:
ID Text
10 6/26/06 begin tramadol, penicilin X 6 CYCLES. 1000mg tylenol X 1 YR after 11/2007
20 7/17/06-advil, qui;
10 7/19/06-ibuprofen. 8/31/06-penicilin, tramadol;
40 9/26/06-penicilin, tramadol;
91 5/23/06-penicilin, amoxicilin, tylenol;
84 10/20/06-ibuprofen, tramadol;
17 12/19/06-vit D, tramadol. 12/1/09 -6/18/10 vit D only for 5 months. 3/7/11 f/up
23 12/19/06-vit D, tramadol; 12/1/09 -6/18/10 vit D; 3/7/11 video follow-up
15 Follow up appt. scheduled
69 talk to care giver
32 12/15/06-2/16/07 everyday Follow-up; 6/8/16 discharged after 2 months
70 12/1/06?Follow up but no serious allergies
70 12/12/06-tylenol, vit D,advil; 1/26/07 scheduled surgery but had to cancel due to severe allergic reactions to advil
預期輸出:
ID Text Text2 Text3
10 6/26/06 begin tramadol, penicilin X 6 CYCLES. 1000mg tylenol X 1 YR after 11/2007
20 7/17/06-advil, qui;
10 7/19/06-ibuprofen. 8/31/06-penicilin, tramadol;
40 9/26/06-penicilin, tramadol;
91 5/23/06-penicilin, amoxicilin, tylenol;
84 10/20/06-ibuprofen, tramadol;
17 12/19/06-vit D, tramadol. 12/1/09 -6/18/10 vit D only for 5 months. 3/7/11 f/up
23 12/19/06-vit D, tramadol; 12/1/09 -6/18/10 vit D; 3/7/11 video follow-up
15 Follow up appt. scheduled
69 talk to care giver
32 12/15/06-2/16/07 everyday Follow-up; 6/8/16 discharged after 2 months
70 12/1/06?Follow up but no serious allergies
70 12/12/06-tylenol, vit D,advil; 1/26/07 scheduled surgery but had to cancel due to severe allergic reactions to advil
到目前爲止我的代碼:
d = []
for i in data.Text:
d = list(datefinder.find_dates(i)) #I can get the dates so far but still want to format the date values as %m/%d/%Y
if len(d) > 1:#Checks for every record that has more than 1 date
for j in range(0,len(d)):
i = " " + " ".join(re.split(r'[^a-z 0-9/-]',i.lower())) + " " #cleans the text strings of any special characters
#data.Text[j] = d[j]r'[/^(.*?)]'d[j+1]'/'#this is not working
#The goal is for the Text column to retain the string from the first date up to before the second date. Then create a new Text1, get every value from the second date up to before the third date. And if there are more dates, create Textn and so on.
#Exception, if a date immediately follows a date (i.e. 12/1/09 -6/18/10) or a date ends a value string (i.e. 6/26/06 begin tramadol, penicilin X 6 CYCLES. 1000mg tylenol X 1 YR after 11/2007), they should be considered to be in the same column
如何使這項工作將節省我一天的任何想法。謝謝!
將所有相關的日期格式是MM/DD/YY格式? –
@Brad Solomon - 最好以mm/dd/yyy爲單位。謝謝! – CodeLearner
我的意思是在您的輸入數據 –