我正在使用XML :: LibXML解析XML文件。對於下面的XML條目我得到的錯誤:使用XML解析XML時出現格式錯誤的UTF-8字符(致命錯誤):: LibXML
Malformed UTF-8 character (fatal) at C:/Perl64/site/lib/XML/LibXML/Error.pm line 217
這是
$context=~s/[^\t]/ /g;
XML中的條目下面
<MedlineCitation Owner="NLM" Status="MEDLINE">
<PMID Version="1">15177811</PMID>
<DateCreated>
<Year>2004</Year>
<Month>06</Month>
<Day>04</Day>
</DateCreated>
<DateCompleted>
<Year>2004</Year>
<Month>08</Month>
<Day>11</Day>
</DateCompleted>
<DateRevised>
<Year>2011</Year>
<Month>04</Month>
<Day>07</Day>
</DateRevised>
<Article PubModel="Print">
<Journal>
<ISSN IssnType="Print">0278-2626</ISSN>
<JournalIssue CitedMedium="Print">
<Volume>55</Volume>
<Issue>2</Issue>
<PubDate>
<Year>2004</Year>
<Month>Jul</Month>
</PubDate>
</JournalIssue>
<Title>Brain and cognition</Title>
<ISOAbbreviation>Brain Cogn</ISOAbbreviation>
</Journal>
<ArticleTitle>Efficiency of orientation channels in the striate cortex for distributed categorization process.</ArticleTitle>
<Pagination>
<MedlinePgn>352-4</MedlinePgn>
</Pagination>
<Affiliation>Cognitive Science Department, Université de Liège, Belgium. [email protected]</Affiliation>
<AuthorList CompleteYN="Y">
<Author ValidYN="Y">
<LastName>Mermillod</LastName>
<ForeName>Martial</ForeName>
<Initials>M</Initials>
</Author>
<Author ValidYN="Y">
<LastName>Chauvin</LastName>
<ForeName>Alan</ForeName>
<Initials>A</Initials>
</Author>
<Author ValidYN="Y">
<LastName>Guyader</LastName>
<ForeName>Nathalie</ForeName>
<Initials>N</Initials>
</Author>
</AuthorList>
<Language>eng</Language>
<PublicationTypeList>
<PublicationType>Journal Article</PublicationType>
</PublicationTypeList>
</Article>
<MedlineJournalInfo>
<Country>United States</Country>
<MedlineTA>Brain Cogn</MedlineTA>
<NlmUniqueID>8218014</NlmUniqueID>
<ISSNLinking>0278-2626</ISSNLinking>
</MedlineJournalInfo>
<CitationSubset>IM</CitationSubset>
<CommentsCorrectionsList>
<CommentsCorrections RefType="ErratumIn">
<RefSource>Brain Cogn. 2005 Jul;58(2):245</RefSource>
</CommentsCorrections>
<CommentsCorrections RefType="RepublishedIn">
<RefSource>Brain Cogn. 2005 Jul;58(2):246-8</RefSource>
<PMID Version="1">16044513</PMID>
</CommentsCorrections>
</CommentsCorrectionsList>
<MeshHeadingList>
<MeshHeading>
<DescriptorName MajorTopicYN="Y">Neural Networks (Computer)</DescriptorName>
</MeshHeading>
<MeshHeading>
<DescriptorName MajorTopicYN="N">Neurons</DescriptorName>
<QualifierName MajorTopicYN="N">physiology</QualifierName>
</MeshHeading>
<MeshHeading>
<DescriptorName MajorTopicYN="N">Orientation</DescriptorName>
<QualifierName MajorTopicYN="Y">physiology</QualifierName>
</MeshHeading>
<MeshHeading>
<DescriptorName MajorTopicYN="N">Pattern Recognition, Visual</DescriptorName>
<QualifierName MajorTopicYN="Y">physiology</QualifierName>
</MeshHeading>
<MeshHeading>
<DescriptorName MajorTopicYN="N">Visual Cortex</DescriptorName>
<QualifierName MajorTopicYN="Y">physiology</QualifierName>
</MeshHeading>
</MeshHeadingList>
</MedlineCitation>
但我想從這個入口的東西PMID,DateRevised,PubDate,ArticleTitle,CommentsCorrectionList和MeshHeadingList。但是,如果我刪除包含其他字符的隸屬關係,則此錯誤不再存在。我應該如何解決這個錯誤?
是您文件實際上保存在UTF-8中?我懷疑這不是,但是LibXML認爲它是,並且在它碰到「列日大學」時會發瘋。 –
@XavierHolt由於您的意思是「<?xml version =」1.0「encoding =」UTF-8「?>」在文件的開頭?如果是的話,它有這條線。如果這是一個愚蠢的問題,我很抱歉,我不是這個領域的。 – smandape
這是它的一半。該部分告訴你的XML解析器需要什麼字符編碼。另一半是將文件保存到磁盤中的編碼。例如,如果您將文件保存爲UTF-8,則「é」字符將由字節序列「0xC3A9」表示,但如果您將文件保存在Windows-1252,它將由單個字節「0xE9」表示。如果LibXML期待UTF-8字符,但遇到不是UTF-8的東西,則會引發錯誤。 –