在Java中使用正則表達式解析HTTP XML響應

我正在進行API調用，現在我需要從響應中獲取特定的一段數據。我需要得到DocumentID爲「說明」發票，這在下面的情況下是110107.在Java中使用正則表達式解析HTTP XML響應

我已經創建了一個方法來獲取從這樣得到一個單一的標籤數據：

public synchronized String getTagFromHTTPResponseAsString(String tag, String body) throws IOException { 

    final Pattern pattern = Pattern.compile("<"+tag+">(.+?)</"+tag+">"); 
    final Matcher matcher = pattern.matcher(body); 
    matcher.find(); 

    return matcher.group(1); 

} // end getTagFromHTTPResponseAsString

然而，我的問題是這樣的結果集，有同一個標籤的多個領域，我需要一個特定的一個。這裏是迴應：

<?xml version="1.0" encoding="utf-8"?> 
<Order TrackingID="351535" TrackingNumber="TEST-843245" xmlns=""> 
    <ErrorMessage /> 
    <StatusDocuments> 
    <StatusDocument NUM="1"> 
     <DocumentDate>7/14/2017 6:52:00 AM</DocumentDate> 
     <FileName>4215.pdf</FileName> 
     <Type>Sales Contract</Type> 
     <Description>Uploaded Document</Description> 
     <DocumentID>110098</DocumentID> 
     <DocumentPlaceHolder /> 
    </StatusDocument> 
    <StatusDocument NUM="2"> 
     <DocumentDate>7/14/2017 6:52:00 AM</DocumentDate> 
     <FileName>Apex_Shortcuts.pdf</FileName> 
     <Type>Other</Type> 
     <Description>Uploaded Document</Description> 
     <DocumentID>110100</DocumentID> 
     <DocumentPlaceHolder /> 
    </StatusDocument> 
    <StatusDocument NUM="3"> 
     <DocumentDate>7/14/2017 6:52:00 AM</DocumentDate> 
     <FileName>CRAddend.pdf</FileName> 
     <Type>Other</Type> 
     <Description>Uploaded Document</Description> 
     <DocumentID>110104</DocumentID> 
     <DocumentPlaceHolder /> 
    </StatusDocument> 
    <StatusDocument NUM="4"> 
     <DocumentDate>7/14/2017 6:52:00 AM</DocumentDate> 
     <FileName>test.pdf</FileName> 
     <Type>Other</Type> 
     <Description>Uploaded Document</Description> 
     <DocumentID>110102</DocumentID> 
     <DocumentPlaceHolder /> 
    </StatusDocument> 
    <StatusDocument NUM="5"> 
     <DocumentDate>7/14/2017 6:55:00 AM</DocumentDate> 
     <FileName>Invoice.pdf</FileName> 
     <Type>Invoice</Type> 
     <Description>Invoice</Description> 
     <DocumentID>110107</DocumentID> 
     <DocumentPlaceHolder /> 
    </StatusDocument> 
    </StatusDocuments> 
</Order>

我試圖創建和https://regex101.com/測試出我的正則表達式，得到了這個表達式在那裏工作，但我不能讓它正確地翻譯過到我的Java代碼：

<Description>Invoice<\/Description> 
     <DocumentID>(.*?)<\/DocumentID>

來源

2017-07-14 Dustin N.

不要使用正則表達式來解析XML。使用XML解析器。 – jsheeran

正則表達式用於字符串匹配，不用於XML解析。我會推薦使用許多XML解析庫之一。另外在我的經驗中，正則表達式可能會很難使用和維護。 – MartinByers

與Jsoup

實施例嘗試：

import org.jsoup.Jsoup; 
import org.jsoup.nodes.Document; 
import org.jsoup.nodes.Element; 
import org.jsoup.select.Elements; 

public class sssaa { 
    public static void main(String[] args) throws Exception { 
     String xml = "yourXML";   
     Document doc = Jsoup.parse(xml); 
     Elements StatusDocuments = doc.select("StatusDocument"); 
     for(Element e : StatusDocuments){ 
      if(e.select("Description").text().equals("Invoice")){ 
       System.out.println(e.select("DocumentID").text()); 
      }   
     } 
    } 
}

來源

2017-07-14 12:55:06 Eritrean

我做了什麼來解決這個問題是使用StringBuilder的響應轉換成一個字符串，然後使用這段代碼來獲得DocumentID：

// Create the pattern and matcher 
Pattern p = Pattern.compile("<Description>Invoice<\\/Description><DocumentID>(.*)<\\/DocumentID>"); 
Matcher m = p.matcher(responseText); 

// if an occurrence if a pattern was found in a given string... 
if (m.find()) { 
    // ...then you can use group() methods. 
    System.out.println("group0 = " + m.group(0)); // whole matched expression 
    System.out.println("group1 = " + m.group(1)); // first expression from round brackets (Testing) 
} 

// Set the documentID for the Invoice 
documentID = m.group(1);

看起來這可能不是去了解的最佳方式這樣做，但它現在正在工作。我會回來嘗試用這裏給出的建議更正確的解決方案來清理它。

來源

2017-07-14 13:15:03

@Eritrean的答案效果很好，而且更乾淨。我正在實施該解決方案 –

在Java中使用正則表達式解析HTTP XML響應

回答

相關問題