我有一個問題..我有一個代碼,下載一些XML文件,並刪除一些我不需要的標籤。因爲這一切都是finde。我的XML文件是UTF-8,我沒有問題。正則表達式破壞我的UTF-8 XML(PHP)
但自從我加入了代碼替換和更改標題值我的XML文件不是UTF-8壽命長,我得到這個錯誤信息:
"D:\Anwendung\PHP 7\php-win.exe" C:\Users\Jan\PhpstormProjects\censored\test.php
PHP Warning: DOMDocument::load(): Input is not proper UTF-8, indicate encoding !
Bytes: 0xE3 0xA4 0x63 0x68 in file:/C:/Users/Jan/PhpstormProjects/censored/data/gamesplanet.xml, line: 1423 in C:\Users\Jan\PhpstormProjects\censored\test.php on line 18
PHP Fatal error: Uncaught Error: Call to a member function getElementsByTagName() on null in C:\Users\Jan\PhpstormProjects\censored\test.php:23
Stack trace:
#0 C:\Users\Jan\PhpstormProjects\censored\test.php(86): countAd('data/gamesplane...')
#1 {main}
thrown in C:\Users\Jan\PhpstormProjects\censored\test.php on line 23
Process finished with exit code 255
在行1423個展臺:W㥣hter Von Mittelerde
如果我不通過下面的代碼,我得到沒有錯誤消息,並在1423行:Wächter von Mittelerde
有沒有人有一個想法,可以幫助我嗎?
代碼:
function loadTitles($tagName, $path){
$dom = new DOMDocument('1.0', 'utf-8');
$dom->preserveWhiteSpace = false;
$dom->formatOutput = true;
$dom->load($path);
$marker = $dom->getElementsByTagName($tagName);
for ($i = $marker->length - 1; $i >= 0; $i--) {
$word = $marker->item($i)->textContent;
$escapedWord = escapWord($word);
$escapedWord = modifyWord($escapedWord);
$marker->item($i)->textContent = $escapedWord;
}
$dom->saveXML();
$dom->save($path);
}
function escapWord($string){
$replaceNothing = [":", ",", ";", "`", "#", "'", "´", "–", "!", "(", ")", ".", "@", "’", "+", "™"];
$replaceSpace = ["-", "–", "_", "/", ":"];
$delete = ["Steam", "Eu", "Key", "CD", "Gift", "Edition", "Pack", "Uplay", "Required", "Collection", "Origin", "HD", "Complete", "Digital", "Download", "EA", "Europa", "RPG", "Activated", "Access", "Code", "Limited", "Direct", "Bundle", "Special", "CDKEY", "GLOBAL", "EARLY", "ACCESS", "Card", "Cartel", "Player", "Trade", "DE", "GOG", "Multilanguage", "Multi", "Full", "Only", "UNCUT", "Cut", "Box", "Ps Vita", "VIP", "Rockstar", "Subscription"];
$string= str_replace($replaceNothing, '', $string);
$string= str_replace($replaceSpace, ' ', $string);
$string= preg_replace('~\b(?:' . implode('|', $delete) . ')\b~i', '', $string);
$string= str_replace("&", ' & ', $string);
$string= strtolower($string);
$string= ucwords($string);
$string= preg_replace('/\bAsia\b/i', 'ASIA', $string);
$string= preg_replace('/\buk\b/i', 'UK', $string);
$string= preg_replace('/\bAU\b/i', 'AU', $string);
$string= preg_replace('/\bXBOX\b/i', 'XBOX ', $string);
$string= preg_replace('/\bpc\b/i', 'PC', $string);
$string= preg_replace('/\bus\b/i', 'US', $string);
$string= preg_replace('/\bru\b/i', 'RUS', $string);
$string= preg_replace('/\bRUS\b/i', 'RUS', $string);
$string= preg_replace('/\bPS4\b/i', 'PS4', $string);
$string= preg_replace('/\bAddon\b/i', 'AddOn', $string);
$string= preg_replace('/\bPlay Station 4\b/i', 'PS4', $string);
$string= preg_replace('/\bPs4\b/i', 'PS4', $string);
$string= preg_replace('/\bPs3\b/i', 'PS3', $string);
$string= preg_replace('/\bPlayStation 4\b/i', 'PS4', $string);
$string= preg_replace('/\bPlay Station 3\b/i', 'PS3', $string);
$string= preg_replace('/\bPlayStation 3\b/i', 'PS3', $string);
$string= preg_replace('/\bPlayStation Network\b/i', 'PSN', $string);
$string= preg_replace('/\bPSN\b/i', 'PSN', $string);
$string= preg_replace('/\bXX\b/i', 'XX', $string);
$string= preg_replace('/\bXIX\b/i', 'XIX', $string);
$string= preg_replace('/\bXVIII\b/i', 'XVIII', $string);
$string= preg_replace('/\bXVII\b/i', 'XVII', $string);
$string= preg_replace('/\bXVI\b/i', 'XVI', $string);
$string= preg_replace('/\bXV\b/i', 'XV', $string);
$string= preg_replace('/\bXIV\b/i', 'XIV', $string);
$string= preg_replace('/\bXiii\b/i', 'XIII', $string);
$string= preg_replace('/\bXii\b/i', 'XII', $string);
$string= preg_replace('/\bXi\b/i', 'XI', $string);
$string= preg_replace('/\bIX\b/i', 'IX', $string);
$string= preg_replace('/\bVIII\b/i', 'VIII', $string);
$string= preg_replace('/\bVII\b/i', 'VII', $string);
$string= preg_replace('/\bVI\b/i', 'VI', $string);
$string= preg_replace('/\bV\b/i', 'V', $string);
$string= preg_replace('/\bIV\b/i', 'IV', $string);
$string= preg_replace('/\bIII\b/i', 'III', $string);
$string= preg_replace('/\bII\b/i', 'II', $string);
$string= preg_replace('/\bdlc\b/i', 'DLC', $string);
$string= trim(preg_replace('/\s\s+/', ' ', str_replace("\n", " ", $string)));
return $string;
}
function modifyWord($string){
if(strpos($string, "Counter Strike Offensive") !== false){
$newstring = explode("Offensive", $string);;
$newstring[0] = $newstring[0] . "Global Offensive";
$string = $newstring[0] . $newstring[1];
}
return $string;
}
問候,並感謝您!
問題是您使用多字節字符串(UTF8)使用不支持多字節字符的函數('str_replace','ucwords','strtolower','preg_replace'沒有u修飾符)。改爲使用'mb_'函數,並使用帶有'preg_replace'的u修飾符。 –
請注意,'preg_replace'可以將數組作爲第一個和第二個參數。 –
你可以給我一個代碼片段,我怎麼能做到這一點? - 因爲我不知道mb_functions是什麼意思,以及「u修飾符」是什麼意思? – Jan