原始字符串中的非法字符'＆'REXML解析

您好，我正在嘗試使用REXML解析XML文件....當我的XML文件中存在非法字符時...此時它的絕對失敗。原始字符串中的非法字符'＆'REXML解析

那麼有什麼方法可以替換或刪除這些字符？

失敗，並在原始字符串REXML錯誤非法字符「&」解析

<head> Negative test for underlying BJSPRICEENG N4&N5 
</head> 


doc = REXML::Document.new(File.open(file_name,"r:iso-8859-1:utf-8")) 

testfile.elements["head"].text 





doc = REXML::Document.new(content) 
dir_path = doc.elements["TestBed/TestDir"].attributes["path"].to_s 
    doc.elements.each("TestBed/TestDir") do |directory| 
     directory.elements.each("file") do |testfile| 

t= testfile.elements["head"].text 

end 
end 
end 




<file name="toptstocksensbybjs.m"> 
     <MCheck></MCheck> 
     <TestExtension></TestExtension> 
     <TestType></TestType> 


<fcn name="lvlTwoDocExample" linenumber="20"> 
<head> P1><& 
</head> 

</fcn> 

    </file>

來源

2013-06-21 Vinay

對於你的情況，刪除非法&字符來解析，您可以嘗試：

content = File.open(file_name,"r:iso-8859-1:utf-8").read 
content.gsub!(/&(?!(?:amp|lt|gt|quot|apos);)/, '&amp;') 
doc = REXML::Document.new(content)

然而，對於那些其他非法字符，尤其是那些未配對的<,>,'或"，這將會困難得多。

來源

2013-06-21 14:30:43

您正在列出幾個常用實體，但完整列表很長... – samuil

@samuil只有這5個XML，不像HTML。 –

@ArieShaw你能解釋一下這裏發生了什麼......只是＆將被替換爲＆amp ......以及字符串中的其他字符如何><'「？ – Vinay

原始字符串中的非法字符'＆'REXML解析

回答

相關問題