2009-05-20 211 views
4

我想傳遞的utidy到美麗的湯,結果,鼻翼:美麗的湯和uTidy

page = urllib2.urlopen(url) 
options = dict(output_xhtml=1,add_xml_decl=0,indent=1,tidy_mark=0) 
cleaned_html = tidy.parseString(page.read(), **options) 
soup = BeautifulSoup(cleaned_html) 

運行時,下面的錯誤結果:

Traceback (most recent call last): 
    File "soup.py", line 34, in <module> 
    soup = BeautifulSoup(cleaned_html) 
    File "/var/lib/python-support/python2.6/BeautifulSoup.py", line 1499, in __init__ 
    BeautifulStoneSoup.__init__(self, *args, **kwargs) 
    File "/var/lib/python-support/python2.6/BeautifulSoup.py", line 1230, in __init__ 
    self._feed(isHTML=isHTML) 
    File "/var/lib/python-support/python2.6/BeautifulSoup.py", line 1245, in _feed 
    smartQuotesTo=self.smartQuotesTo, isHTML=isHTML) 
    File "/var/lib/python-support/python2.6/BeautifulSoup.py", line 1751, in __init__ 
    self._detectEncoding(markup, isHTML) 
    File "/var/lib/python-support/python2.6/BeautifulSoup.py", line 1899, in _detectEncoding 
    xml_encoding_match = re.compile(xml_encoding_re).match(xml_data) 
TypeError: expected string or buffer 

我收集utidy返回的XML文檔而BeautifulSoup需要一個字符串。有沒有一種方法可以轉換clean_html?或者我做錯了,應該採取不同的方法?

回答

11

只是將str()換成cleaned_html 纔會傳遞給BeautifulSoup。

2

將傳遞給BeautifulSoup的值轉換爲字符串。 在你的情況下,做以下編輯到最後一行:

soup = BeautifulSoup(str(cleaned_html))