2017-07-25 39 views
1

我需要在一個段落中同時找到美元金額和少數(3或4)圍繞該金額的單詞。正則表達式Python在同一時間找到美元金額和幾個字

in-process research and development of $184.3 million and charges $120 of 
million for the impairment of long-lived assets. See Notes 2, 16 and21 to the 
Consolidated Financial Statements. Income from continuingoperations for the  
fiscal year ended September 30, 2001 also includes a netgain on sale of 
businesses and investments of $276.6 million and a net gainon the sale of 
common shares of a subsidiary of $64.1 million. 

我想要得到的是類似下面, [金額,金額+數字的話,3-4個單詞量之前後。

[$184.3 $184.3 million, research and development of $184.3 million],[$120, $120 of million,charges $120 of 
million for the impairment of long-lived assets ], [$276.6, $276.6 million, investments of $276.6 million] ,[ $64.1, $64.1 million, a subsidiary of $64.1 million.] 

我試過的是這個,它只發現美元的金額。

[\$]{1}\d+\.?\d{0,2} 

謝謝!

+0

你發現了美元的金額,你可以使用它的索引和字符串切片來找到圍繞它的文字 – anon

+0

希望你想要的輸出我不認爲是必要的只是正則表達式,這是關閉的:'([^ \ S] + \砂\ S [^ \ S] + \ S)?([^ \ S] + \ S)([\ $ {1} \ d + \。?\ d {0,2})([ \ w \ s] * illion)' – depperm

回答

1

讓我們名字的模式,你必須:

digit_word_patt = amount_patt + r" (\w+)" 

現在,周圍3-4的話,請執行下列操作:

amount_patt = r"[\$]{1}[\d,]+\.?\d{0,2}" 

位數字應該使用上述被定義則:

words_patt = r"(\S+){3, 4}" + amount_patt + r"(\S+){3, 4}" 

你完成了!現在簡單地用你的re方法來提取字符串。

+0

謝謝,我能再問一件事嗎?我剛剛發佈的美元數量不僅僅是數百,而且還包含了數千種。我怎樣才能同時找到數百和數千個? – MMM

+0

千位數字是否與當前模式一致? –

+0

是的,該段是一個具體的例子,但實際的文件有很多段落包含這樣的事情,「10056萬美元的稅前」,所以我想提取的數量包含逗號! – MMM

相關問題