2016-09-30 38 views
1

我有一些文本的例子是如下:正則表達式來獲得行的n個匹配後

Lactose Hydrogen Breath Test 
    Time 
     Time Point (min) 
     H2 (ppm) 
     H2 Change 

    (ppm) 
     Hydrogen (ppm) 

     0937 
     0 
     0/0 

     Time point (min) 

     0 
     10 
     20 
     30 
     40 
     50 
     60 
     70 
     80 
     90 
     100 


     Notes: Measurements at 120 and 150 mins are insignificant changes and are most probably due to sporadic error. 

     Results are not consistent with Lactose malabsorption. 

     Lactose intolerance is not suggested. 

This is now some other text that can be anything 

我只是想後的「注意」到第一五行提取並留下所有的其他的東西(在這種情況下可達乳糖不耐症是不建議,但之後可以有任何類型的文本

我使用的是當前的Java提取此:

public Map<String,String> LactoseTestExtractor(String str){ 

     Pattern match_pattern = Pattern.compile("Lactose Hydrogen Breath Test(.*?Interpretation[^\\r|^\\n]*)",Pattern.DOTALL); 
     Matcher matchermatch_pattern = match_pattern.matcher(str); 

     Pattern match_pattern2 = Pattern.compile("Lactose Hydrogen Breath Test.*?(Notes:.*?\\r|\\n[\\r|\\n]?.*?\\r|\\n[\\r|\\n]?)",Pattern.DOTALL); 
     Matcher matchermatch_pattern2 = match_pattern2.matcher(str); 

     if (matchermatch_pattern.find()) { 
      lact=matchermatch_pattern.group(1).toString().trim(); 
      System.out.println("lact1"+lact); 

     } 

     else if (matchermatch_pattern2.find()){ 
      lact=matchermatch_pattern2.group(1).toString().trim(); 
      System.out.println("lact2"+lact); 

     } 

但是我收到耳鼻喉科ire match而不僅僅是我想要的是:

Measurements at 120 and 150 mins are insignificant changes and are most probably due to sporadic error. 

     Results are not consistent with Lactose malabsorption. 

     Lactose intolerance is not suggested. 

我該如何糾正?不知道它的Java或正則表達式的問題

+1

您的輸入不包含「乳糖氫氣呼吸測試」,那麼爲什麼這兩種模式匹配? –

+2

爲什麼你想使用正則表達式呢?只需從它的位置搜索'Note',子字符串,然後用'[\ r \ n] +'進行分割,最後從結果數組中獲取五個第一個元素。 – A4L

+0

@Andy Turner-道歉我已經改變了例子 –

回答