2011-08-05 55 views
0

我想獲得一個正則表達式來找到我的上線圖案的多個條目。注:我一直在使用正則表達式了大約一個小時... =/我需要幫忙的正則表達式表達正確

例如:

<a href="G2532" id="1">back</a> <a href="G2564" id="2">next</a> 

應該匹配兩次:

1) <a href="G2532" id="1">back</a> 
2) <a href="G2564" id="2">next</a> 

我認爲答案在於正確貪婪的掌握VS不願意VS佔有慾,但我似乎無法得到它的工作...

我想我靠近,我到目前爲止創建的正則表達式字符串是:

(<a href=").*(" id="1">).*(</a>) 

但正則表達式匹配返回1場,整個字符串...

我有一個(編譯)Java的正則表達式測試工具在下面的代碼。這是我最近(無用的)嘗試使用該程序得到的結果,輸出應該非常直觀。

Enter your regex: (<a href=").*(" id="1">).*(</a>) 
Enter input string to search: <a href="G2532" id="1">back</a> <a href="G2564" id="2">next</a> 
I found the text "<a href="G2532" id="1">back</a> <a href="G2564" id="2">next</a>" starting at index 0 and ending at index 63. 

Enter your regex: (<a href=").*(" id="1">).*(</a>)? 
Enter input string to search: <a href="G2532" id="1">back</a> <a href="G2564" id="2">next</a> 
I found the text "<a href="G2532" id="1">back</a> <a href="G2564" id="2">next</a>" starting at index 0 and ending at index 63. 

Enter your regex: (<a href=").*(" id="1">).*(</a>)+ 
Enter input string to search: <a href="G2532" id="1">back</a> <a href="G2564" id="2">next</a> 
I found the text "<a href="G2532" id="1">back</a> <a href="G2564" id="2">next</a>" starting at index 0 and ending at index 63. 

Enter your regex: (<a href=").*(" id="1">).*(</a>)? 
Enter input string to search: <a href="G2532" id="1">back</a> <a href="G2564" id="2">next</a> 
I found the text "<a href="G2532" id="1">back</a> <a href="G2564" id="2">next</a>" starting at index 0 and ending at index 63. 

Enter your regex: ((<a href=").*(" id="1">).*(</a>))? 
Enter input string to search: <a href="G2532" id="1">back</a> <a href="G2564" id="2">next</a> 
I found the text "<a href="G2532" id="1">back</a> <a href="G2564" id="2">next</a>" starting at index 0 and ending at index 63. 
I found the text "" starting at index 63 and ending at index 63. 

Enter your regex: ((<a href=").*(" id="1">).*(</a>))+? 
Enter input string to search: <a href="G2532" id="1">back</a> <a href="G2564" id="2">next</a> 
I found the text "<a href="G2532" id="1">back</a> <a href="G2564" id="2">next</a>" starting at index 0 and ending at index 63. 

Enter your regex: (((<a href=").*(" id="1">).*(</a>))+?) 
Enter input string to search: <a href="G2532" id="1">back</a> <a href="G2564" id="2">next</a> 
I found the text "<a href="G2532" id="1">back</a> <a href="G2564" id="2">next</a>" starting at index 0 and ending at index 63. 

這裏是在Java:

import java.io.BufferedReader; 
import java.io.IOException; 
import java.io.InputStreamReader; 
import java.util.regex.Pattern; 
import java.util.regex.Matcher; 

public class RegexTestHarness { 

    public static void main(String[] args){ 
     try{ 
      while (true) { 

       System.out.print("\nEnter your regex: "); 

       BufferedReader reader = new BufferedReader(new InputStreamReader(System.in)); 
       Pattern pattern = Pattern.compile(reader.readLine()); 

       System.out.print("Enter input string to search: "); 
       Matcher matcher = pattern.matcher(reader.readLine()); 

       boolean found = false; 
       while (matcher.find()) { 
        System.out.println("I found the text \"" + matcher.group() + "\" starting at " + 
         "index " + matcher.start() + " and ending at index " + matcher.end() + "."); 
        found = true; 
       } 
       if(!found){ 
        System.out.println("No match found."); 
       } 
      } 
     } catch (IOException e) { 
      e.printStackTrace(); 
      System.exit(-1); 
     } 

    } 
} 
+1

[你不應該嘗試用RegEx解析HTML](http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags/1732454#1732454) – Bohemian

回答

1

試試這個:

<a href=".*?" id="1">.*?</a> 

我已經加入了?.*

但是,當轉換的捕捉到非貪婪有疑問,你可以使用這個技巧:

<a href="[^"]*" id="1">[^<]*</a> 

[^"]*意味着任何數量的不是一個雙引號
[^<]*字符的意思是不是左角度任意數量的字符

所以你避免擔心貪婪/非貪婪

+0

波希米亞人,你讓我開始了正確的軌道。我用了你的技術,但我意識到我不得不改變ID =「1」到「ID =」 [1-9] +」。最後,現在的工作。謝謝。 – Ryan