2012-12-30 44 views
0

我提出這個請求:維基百科的API忽略引用錯誤或腳註

http://en.wikipedia.org/w/api.php?format=xml&action=query&titles=self-administration&prop=revisions&rvprop=content&rvparse=&rvsection=0 

我的目標是獲得從文章的介紹純文本。

它給了我一個XML文件中的HTML。 strip_tagspreg_replace,後去除引用,我得到這個:

Self-administration is, in its medical sense, the process of a subject administering a pharmacological substance to him-, her-, or itself. [...] Cite error: There are tags on this page, but the references will not show without a {{Reflist}} template or a tag; see the help page.

我想刪除

Cite error: There are tags on this page, but the references will not show without a {{Reflist}} template or a tag; see the help page.

我怎樣才能得到的是騎無論是用PHP(preg_replace?),或在我的初始查詢(忽略錯誤?)。

回答

1
$bad = ' <br /><strong class="error">Cite error: There are <code>&lt;ref&gt;</code> tags on this page, but the references will not show without a <code>&#123;&#123;Reflist&#125;&#125;</code> template or a <code>&lt;references /&gt;</code> tag; see the <a href="/wiki/Help:Cite_errors/Cite_error_refs_without_references" title="Help:Cite errors/Cite error refs without references">help page</a>.</strong> '; 

$good = str_replace($bad, '', $intro);