2013-06-05 16 views
0

我有一組來自自然語言工具的XML字符串輸出,需要從中檢索值,還爲那些未出現在輸出字符串中的標記提供空值。試圖使用Extracting data from XML using Java中提供的Java代碼,但它似乎沒有工作。從Java中的XML標記中檢索值

當前樣本標籤庫存列表如下:

<TimeStamp>, <Role>, <SpeakerId>, <Person>, <Location>, <Organization> 

示例XML輸出字符串:

<TimeStamp>00.00.00</TimeStamp> <Role>Speaker1</Role><SpeakerId>1234</SpeakerId>Blah, blah, blah. 

慾望輸出:

TimeStamp: 00.00.00 
Role: Speaker1 
SpeakerId: 1234 
Person: null 
Place: null 
Organization: null 

爲了使用在提供的Java代碼以上鍊接(更新後的代碼),我插入了<Dummy></Dummy>如下:

<Dummy><TimeStamp>00.00.00</TimeStamp><Role>Speaker1</Role><SpeakerId>1234</SpeakerId>Blah, blah, blah.</Dummy> 

但是,它只返回dummy和null。由於我仍然是Java的新手,詳細解釋將不勝感激。

+0

顯示您使用的代碼。並將實際的xml用作輸入。 – acdcjunior

回答

0

這就是我最終爲我的Java包裝做的(僅限於Show TimeStamp)

public class NERPost { 

     public String convertXML (String input) { 
     String nerOutput = input; 
     try { 
      DocumentBuilderFactory docBuilderFactory = 
      DocumentBuilderFactory.newInstance(); 
      DocumentBuilder docBuilder = docBuilderFactory.newDocumentBuilder(); 
      InputSource is = new InputSource();    
      is.setCharacterStream(new StringReader(nerOutput));  
      Document doc = docBuilder.parse(is); 

     // normalize text representation 
     doc.getDocumentElement().normalize(); 
     NodeList listOfDummies = doc.getElementsByTagName("dummy"); 


     for(int s=0; s<listOfDummies.getLength() ; s++){ 
      Node firstDummyNode = listOfDummies.item(s); 
      if(firstDummyNode.getNodeType() == Node.ELEMENT_NODE){ 
       Element firstDummyElement = (Element)firstDummyNode; 

     //Convert each entity label -------------------------------- 

      //TimeStamp 
       String ts = "<TimeStamp>"; 
       Boolean foundTs; 

       if (foundTs = nerOutput.contains(ts)) {      
      NodeList timeStampList = firstDummyElement.getElementsByTagName("TimeStamp"); 

      //do it recursively 
       for (int i=0; i<timeStampList.getLength(); i++) {  
       Node firstTimeStampNode = timeStampList.item(i); 
       Element timeStampElement = (Element)firstTimeStampNode; 
       NodeList textTSList = timeStampElement.getChildNodes(); 
       String timeStampOutput = ((Node)textTSList.item(0)).getNodeValue().trim(); 
       System.out.println ("<TimeStamp>" + timeStampOutput + "</TimeStamp>\n") 
        } //end for 
       }//end if 
      //other XML tags 
       //..... 
       }//end if 
       }//end for 
      } 
      catch... 
       }//end try 
       }} 
0

試試這個方法:d希望能幫助你

File fXmlFile = new File("yourfile.xml"); 
DocumentBuilderFactory dbFactory = DocumentBuilderFactory.newInstance(); 
DocumentBuilder dBuilder = dbFactory.newDocumentBuilder(); 
Document doc = dBuilder.parse(fXmlFile); 

你可以得到子節點列表如下:

NodeList nList = doc.getElementsByTagName("staff"); 

得到這樣的項目:

Node nNode = nList.item(temp); 

Example Site

+0

這就是我最終做的事: –

+0

很高興我可以幫你:D –