2017-05-08 15 views
0

我想用python和nltk解析一些描述藥物處方的醫生筆記。我正在尋找一種方法來確定#項目的數值和項目的拍攝頻率。尋找算法來解析從EHR的藥物注意事項

1 TABLET DAILY 
TAKE 1 TABLET DAILY 
ONE TABLET TWICE DAILY 
2 DAILY 
TWO TABLETS DAILY 
ONE PILL AT BEDTIME 
1/2 PILL TWICE DAILY 
ROLLING WALKER WITH SEAT ATTACHMENT AND HAND BRAKES 
ONE PILL DAILY 
1 TAB PO DAILY 
ONE PILL TWICE A DAY WITH MEALS AS NEEDED 
1 TABLET TWICE DAILY 
300 MG BID 
ONE DAILY 
1 TABLET 3 TIMES DAILY AS NEEDED 
1 DAILY 
TAKE 1 CAPSULE BY MOUTH 4 (FOUR) TIMES A DAY. 
1 TABLET EVERY 4 TO 6 HOURS AS NEEDED 
1 TABLET BY MOUTH TWICE DAILY 
INJECT 34 U TWICE A DAY 

有什麼建議嗎?

+1

這可能會幫助你沿着正確的道路:http://stackoverflow.com/questions/33337410/nltk-reading-in -word-numbers-to-float-numbers – tatlar

+1

你也可以看看這個項目,我無法獲得Earley解析器python代碼運行,但作者似乎一直在研究同樣的問題。 http://www.mit.edu/~6.863/spring2009/projects/project16.html – griffinc

回答

0

通常有multiple variations其中這些是由醫生在臨牀筆記中寫的。 對於如:

1 TABLET DAILY 

也可以。如果你正在尋找一個快速解決書寫正則表達式的Python腳本可能會幫助寫成

1 tab qid 

。但如果你想要更長遠的東西,你可以看看數據和提交i2b2 Medication Information Extraction Challenge