2014-04-29 94 views
0

我解析它代表的研究論文/ artciles的XML文件,並有XML架構之下在MySQL數據庫中存儲在Java中解析XML文件,以獲得特定的文本內容

<article> 
    <article-meta></article-meta> 
    <body> 
    <p> 
    Extensible Markup Language (XML) is a markup language that defines a set of 
    rules for encoding documents in a format that is both human-readable and machine- 
    readable <ref id = 1>. It is defined in the XML 1.0 Specification produced by the 
     W3C, and several other related specifications 
     </p> 
     <p> 
     Many application programming interfaces (APIs) have been developed to aid 
     software developers with processing XML <ref id = 2>. data, and several schema 
     systems exist to aid in the definition of XML-based languages. 
     </p> 
    </body> 
    <back> 
     <ref-list> 
     <ref id = 1>Details about this reference </ref> 
     <ref id = 2>Details about this reference </ref> 
     </ref-list> 
    </back> 
    </article> 

我解析使用DOM文件解析器。其中一個要求是每個ref ID,我必須從身體標籤中引用的位置提取150個左右的字符。我怎樣才能做到這一點 ??

 refId  leftText rightText 
    1   left 150  150 chars on right side 
+0

做XPATH – MadProgrammer

回答

0

假設你使用DOM得到了在代碼中的XML的<ref>標籤元素Id = 1和元素content value = Details about this reference,在一個字符串變量存儲<ref> tag含量值,那麼你可以使用子字符串方法被甩char和右焦炭這樣。

String text ="Details about this reference"; 
String leftText = text.substring(0,7); // get 7 chars from left side 
String rightText =text.substring(text.length()-2); // get 2 char from right side, instead of 2 you have to pass10 

結果

leftText:Details rightText:ce 

注意:你需要提取它,如果之前檢查字符串長度刨絲器超過150小於子會拋出異常ArayIndexBoundOfException

+0

搜索我要從身體中提取它。例如''標籤元素'Id = 1',左右字符將是人可讀和機器可讀的。它在XML 1.0規範中定義** – Abhilash

+0

您是否提取了ref標籤內容值? –