在java中使用正則表達式分隔自定義標籤的內容

我需要一段代碼來抓取Java中的一個String中的標籤所包圍的所有值，並將它們作爲字符串數組返回，如果標籤名稱與一系列關鍵字匹配的話。這些標籤都是由「<>」包圍的普通文本字，以及爲所創建的每個標籤都包圍「」的結束標籤。在java中使用正則表達式分隔自定義標籤的內容

Ex。在文本的

<name>stuff<name/> 
    <locations>example of text<locations/> 
    <storybattles>more text somehow<storybattles/> 
    <maincharacter>characters n stuff <maincharacter/> 
//continues on with random tag text values

returns-

"stuff" 
"example of text" 
"more text somehow" 
"characters n stuff"

最好用的閱讀病例

String inputText="pretend there are tags in here"; 
//Please pretend I added several keywordsd to the keywords list 
ArrayList<String> keywords=new ArrayList<String>(); 
String[] allTheAnswers=kindStackOverflowMentorMethod(inputText,keywords);

雖然我可以用我有限的正則表達式的知識，這樣做我自己，我只是畏縮因爲我知道這可以做得更好。如果你包含對你使用的正則表達式的每個部分的解釋（或者一個聰明的頭腦可能會做的任何解決方案），那麼你從我那裏得到額外的分數。

來源

2016-04-06 Edge363

我曾嘗試我自己的方法，但它基本上我黑客和使用.split，卸下襬臂通過工藝削去，等我只需要一個很小的片段可以理解的正則表達式讓我的腳踏實地或者脫離它。 – Edge363

這裏是我會怎麼做工作的例子：

private static final String DATA = "<name>stuff<name/>\n" + 
     " <locations>example of text<locations/>\n" + 
     " <storybattles>more text somehow<storybattles/>\n" + 
     " <maincharacter>characters n stuff <maincharacter/>"; 

private static final List<String> KEYWORDS = Arrays.asList(
     new String[]{"name", "locations"}); 

private static final String PATTERN = "<%1$s>(.+?)<%1$s/>"; 

public static void main(String[] args) { 

    List<String> strs = new ArrayList<>(); 
    for (String keyword : KEYWORDS) { 
     String tempPattern = String.format(PATTERN, keyword); 
     Pattern pattern = Pattern.compile(tempPattern); 
     Matcher matcher = pattern.matcher(DATA); 

     while(matcher.find()){ 
      strs.add(matcher.group(1)); 
     } 
    } 
}

Regex101 Fiddle

來源

2016-04-06 18:52:05 dambros

非常感謝！它的工作原理與現在一樣，現在你能告訴我「％1 $」和「s」在「<" and ">」之間是如何工作的嗎？ – Edge363

該值是'String.format（）'的佔位符，它不是正則表達式的一部分。在構建Pattern對象之前，我將該值更改爲像「name」這樣的文字字符串，因此正則表達式變爲'（。+？）'。每次迭代關鍵字列表都會將regex更改爲'<>'之間的當前關鍵字。 – dambros

你在找這個？

import java.util.ArrayList; 
import java.util.regex.Matcher; 
import java.util.regex.Pattern; 


public static void main(String[] args) { 

    String inputText=" <name>stuff<name/>\n"+ 
     " <locations>example of text<locations/>\n"+ 
     " <storybattles>more text somehow<storybattles/>\n"+ 
     " <maincharacter>characters n stuff <maincharacter/>"; 

    //Please pretend I added several keywordsd to the keywords list 
    ArrayList<String> keywords=new ArrayList<>(); 
    keywords.add("locations"); 
    keywords.add("maincharacter"); 

    //Call the function 
    ArrayList<String> allTheAnswers=kindStackOverflowMentorMethod(inputText,keywords); 

} 

public static ArrayList<String> kindStackOverflowMentorMethod(String inputText, ArrayList<String> keywords){ 
    ArrayList<String> values=new ArrayList<>(); 
    Matcher m = Pattern.compile("<([a-z][a-z0-9]*)>(.*?)<(?:\\1)\\/>").matcher(inputText); 
    while (m.find()){ 
     if (keywords.indexOf(m.group(1)) > -1) { 
      values.add(m.group(2));    
     } 
    } 
    return values; 
}

正則表達式的說明

<     # match < literally 
([a-z][a-z0-9]*) # first capturing group - match TAG name 
         should start with a letter, followed by 
         0 or more letters or numbers 
>     # match > literally 
(.*?)    # 2nd capturing group - match content surrounded by TAGs 
         non-greedy match 
<     # match < literally 
(?:\1)    # non-capturing group - match previous matched TAG name 
\/>     # match /> literally

來源

2016-04-06 20:08:09 Quinn

絕對！非常感謝你的解釋。所以「？：\ 1」是指抓住以前匹配的一系列字符？因爲這個和改變標籤名稱被搜查阻止我前進。此外，如果我可以提出另一個問題，循環會檢查關鍵詞是否位於文本內部，還是找到位置並使用它來查找特定值？ – Edge363

非捕獲組是由'（？：pattern）'捐獻的，'\ 1'則表示之前匹配的一系列字符。 if（keywords.indexOf（m.group（1））> -1）'檢查關鍵字是否在當前匹配的組1中;如果爲true，則將相應的值添加到列表中。這是一對一對，因此不需要查找位置。 :) – Quinn

在java中使用正則表達式分隔自定義標籤的內容

回答

相關問題