RSS源在開始處有一個「\ n」。我如何刪除它？ - Python的

我想從這個飼料中提取數據：RSS源在開始處有一個「 n」。我如何刪除它？ - Python的

http://realbusiness.co.uk/feed/

但是它看起來與其他不同的飼料，我從拉動。他們是這樣的：

https://www.ft.com/companies?format=rss

當我拉離「https://www.ft.com/companies?format=rss」數據所需的一切是非常簡單的，因爲我使用minidom命名切片數據，並拉我需要的一切，像這樣：

from xml.dom import minidom 
from urllib.request import urlopen 

url = 'https://www.ft.com/companies?format=rss&page=1' 
html = urlopen(url) 
dom = minidom.parse(html) 
item = dom.getElementsByTagName('item') 
for node in item: 
    pubdate = node.getElementsByTagName('pubDate')[0].childNodes[0].nodeValue 
    link = node.getElementsByTagName('link')[0].childNodes[0].nodeValue 
    title = node.getElementsByTagName('title')[0].childNodes[0].nodeValue

然而，當我嘗試做同樣爲「http://realbusiness.co.uk/feed/」使用下面的代碼：

from xml.dom import minidom 
from urllib.request import urlopen 

url = 'http://realbusiness.co.uk/feed/' 
html = urlopen(url) 
dom = minidom.parse(html)

我得到以下錯誤：

Traceback (most recent call last): 
    File "C:/Users/NAME/Desktop/Scripts/scrapesites/deleteme.py", line 6, in <module> 
    dom = minidom.parse(html) 
    File "C:\Python36\lib\xml\dom\minidom.py", line 1958, in parse 
    return expatbuilder.parse(file) 
    File "C:\Python36\lib\xml\dom\expatbuilder.py", line 913, in parse 
    result = builder.parseFile(file) 
    File "C:\Python36\lib\xml\dom\expatbuilder.py", line 207, in parseFile 
    parser.Parse(buffer, 0) 
xml.parsers.expat.ExpatError: XML or text declaration not at start of entity: line 2, column 0

我的結論是爲什麼發生這種情況，是因爲這兩個網站的rss結構略有不同。 'http://realbusiness.co.uk/feed/'在頁面的第一行有'\ n'，而'https://www.ft.com/companies?format=rss'沒有。

如何刪除「\ n」以便我可以解析數據？

如果我對我的解決方案有誤，那麼正確的解決方案是什麼？

在此先感謝。

來源

2017-06-28 semiflex

我不認爲這是用正確的方法...的urlopen不返回一個字符串。 –

它可能通過讀取\n性格分析，像這樣工作之前：

html = urlopen(url) 
html.read(1) 
dom = minidom.parse(html)

來源

2017-06-28 12:11:13 ikkuh

代碼使用'minidom.parse'，它採用類似於對象的文件而不是'string'。當文件以換行符，空格或製表符開始時，會崩潰。 – ikkuh

我看到它是壞的。我嚴重誤解了一些東西。刪除我的答案，並扭轉downvote。對任何麻煩抱歉。乾杯。 –

RSS源在開始處有一個「\ n」。我如何刪除它？ - Python的

回答

相關問題