在file_get_contents的輸出中刪除'

我正在使用維基百科的工具。我試圖用file_get_contents檢索頁面https://de.wikipedia.org/wiki/Spezial:Linkliste/Hans_Jansen_(Arabist)。然後，我通過查找列表並將其分解到\ n來提取所有列表項。在file_get_contents的輸出中刪除'

之後我想檢索以列表項命名的文章文本。對於我做

file_get_contents(https://de.wikipedia.org/w/index.php?action=raw&title=".urlencode($article));

一切順利，直到一個名爲凱爾布伊本As'ad文章當我複製文章名稱作爲純文本導致的

https://de.wikipedia.org/w/index.php?action=raw&title=Ka

檢索，一切順利的話：

$article = "Ka'b ibn As'ad"; 
$page = "https://".$server."/w/index.php?action=raw&title=".urlencode($article);

進行urlencode手動輸入，並從網站中檢索$文章的輸出比較顯示差異：

manually; Ka%27b+ibn+As%27ad 
    website: Ka%26%23039%3Bb%20ibn%20As%26%23039%3Bad

比較用htmlspecialchars（）的輸出更是令人印象深刻：

manually; Ka'b ibn As'ad 
    website: Ka&#039;b ibn As&#039;ad

如何擺脫這些'特殊字符？顯然htmlspecialchars_decode（）不起作用。

htmlspecialchars_decode（）僅轉換具有名稱的html實體，而不是具有數字的html實體。您需要使用html-entity-decode()！

2016-11-27 15:33:24

回答