從Python中解析XML中的XML，格式錯誤

我有幾個帶有XML的URL，到目前爲止一切正常，但下一個URL我以一種奇怪的格式獲取XML。從URL獲取XML我使用：從Python中解析XML中的XML，格式錯誤

req = Request("http://www.someUrlWithXml.com", 
        headers={'Connection': 'keep-alive', 
         'Cache-Control': 'max-age=0', 
         'Upgrade-Insecure-Requests': '1', 
         'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.36', 
         'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8', 
         'Accept-Encoding': 'gzip, deflate, sdch', 
         'Accept-Language': 'cs,en-GB;q=0.8,en;q=0.6'}) 

    xml = urlopen(req).read()

當我打印可變XML我得到一些奇怪的格式：

b'\x1f\x8b\x08\x00\xc7\xf6-Y\x00\xff\xed}{o\x1c9\x92 ...

在原來的XML是：

<?xml version="1.0" encoding="utf-8"?> 
<!-- 0 FT--> 
<!-- 1 1st Half--> 
<!-- 2 2nd Half--> 
<!-- 3 1st Quarter--> 
<!-- 4 2nd Quarter--> 
<!-- 5 3rd Quarter--> 
<!-- 6 4th Quarter--> 
<!-- 7 Total Team Goals--> 
<!-- 8 OutRight--> 
<!-- 9 Match Props--> 
<!-- 10 Total Booking In Match--> 
<!-- 11 Red Cards--> 
<!-- 12 First Booking-->

來源

2017-05-30 EdWood

b'\x1f\x8b\x08\x00\xc7\xf6-Y\x00\xff\xed}{o\x1c9\x92 ...

響應前面的b表示它是一個字節對象，而不是字符串。將其解碼爲一個字符串，使用decode：

xml.decode('utf-8')

來源

2017-05-31 00:31:06

我試圖解碼和現在我得到：UnicodeDecodeError錯誤：「UTF-8」編解碼器不能在位置1解碼字節0x8b：無效的起始字節 – EdWood

@EdWood：你可以嘗試xml.decode（「utf-8」，「忽略」），看看是否讓你有用的結果？ –

它不可用，它給了我這樣的東西：.Y} ko9P'「INw8e：N2'：-K] [Id％*Vɗnj}）fr： – EdWood

從Python中解析XML中的XML，格式錯誤

回答

相關問題