解析來自Wikipedia API的響應

我想解析來自Wikipedia API（MediaWiki）的響應。我使用的URL的形式如下 -解析來自Wikipedia API的響應

從API

https://en.wikipedia.org/w/api.php?action=query&prop=revisions&rvprop=content&format=xml&titles=Argo_(2012_film)

響應有一個XML標記看起來裏面的維基百科的內容，如：（這只是一個不完全的樣本）

{{Use mdy dates|date=October 2012}} {{Infobox film | name = Argo | image = 
Argo2012Poster.jpg | alt = <!-- See: WP:ALT --> | caption = Theatrical release poster | 
tagline = "The movie was fake. The mission was real." | director = [[Ben Affleck]] | 
producer = [[Grant Heslov]]<br />Ben Affleck<br />[[George Clooney]] | based on = {{Based 
on|''The Master of Disguise''|[[Tony Mendez|Antonio J. Mendez]]}}<br />{{Based on|''The 
Great Escape''|[[Joshuah Bearman]]}} | screenplay = [[Chris Terrio]] | starring = Ben 
Affleck<br />[[Bryan Cranston]]<br />[[Alan Arkin]]<br />[[John Goodman]] | music = 
[[Alexandre Desplat]] | cinematography = [[Rodrigo Prieto]] | editing = [[William 
Goldenberg]] | studio = [[Graham King|GK Films]]<br />[[Smokehouse Pictures]] | distributor = 
[[Warner Bros.]] | released = {{Film date|2012|08|31|Telluride Film 
Festival|2012|10|12|United States}} | runtime = 120 minutes<ref> ...continued

這確實不像JSON或XML，我該如何解析？

來源

2013-12-15 Ankit Rustagi

看起來它給你的頁面的維基代碼。在有問題的網頁上點擊編輯，你會看到......非常多。 – SimplyPanda

是的，你說得對，但無論如何，我可以解析這個？ –

你想用這些數據做什麼？ – supersam654

如果您想要將解析爲HTML的內容添加到查詢中，請添加&rvparse。

例如，當您執行查詢

https://en.wikipedia.org/w/api.php?action=query&prop=revisions&rvprop=content&format=xml&titles=Argo_%282012_film%29&rvparse

響應包含類似的信息（跳過信息框後）：

<i><b>Argo</b></i> is a 2012 American <a href="/wiki/Political_thriller" 
title="Political thriller">political thriller</a> film directed by <a 
href="/wiki/Ben_Affleck" title="Ben Affleck">Ben Affleck</a>.

來源

2013-12-15 17:29:05 svick

謝謝，這應該讓事情變得更容易。 –

如果我使用'curl'來獲取你提到的地址，我會收到類似這樣的內容：'title = "政治驚悚片" >'有沒有辦法在沒有瀏覽器的情況下轉到非轉義版本？ – hashier

@hashier XML中的響應是HTML。如果您想使用它，請使用XML解析器。 – svick

解析來自Wikipedia API的響應

回答

相關問題