beautifulsoup解析錯誤 - 垃圾字符

代碼 - 不知道我做了什麼，使BeautifulSoup（BS）不起作用beautifulsoup解析錯誤 - 垃圾字符

import mechanize 
import urllib2 
from bs4 import BeautifulSoup 

#create a browser object to login 
browser = mechanize.Browser() 

#tell the browser we are human, and not a robot, so the mechanize library doesn't block us 
browser.set_handle_robots(False) 

browser.addheaders = [('User-Agent','Mozilla/5.0 (Windows U; Windows NT 6.0; en-US; rv:9.0.6')] 
#url 
url = 'https://www.google.com.au/search?q=python' 
#open the url in our virtual browser 
browser.open(url) 
html = browser.response().read() 
print html 
soup = BeautifulSoup(html) 
print(soup.prettify())

錯誤

HTMLParseError: junk characters in start tag: u'{t:1}); class="gbzt ', at line 1, column 42892 

<!doctype html><html itemscope="" itemtype="http://schema.org/WebPage" lang="en-AU"><head><meta content="text/html; charset=UTF-8" http-equiv="Content-Type"><meta content="/images/google_favicon_128.png" itemprop="image"><title>python - Google Search</title><style>#gb{font:13px/27px Arial,sans-serif;height:30px}#gbz,#gbg{position:absolute;white-space:nowrap;top:0;height:30px;z-index:1000}#gbz{left:0;padding-left:4px}#gbg{right:0;padding-right:5px}#gbs{background:transparent;position:absolute;top:-999px;v

來源

2014-05-18 yoshiserry

你似乎是打錯誤，因爲你拉在CSS中的CSS，並試圖解析它爲HTML。這裏有一個類似的問題，可能會幫助http://stackoverflow.com/questions/10401110/using-beautiful-soup-to-convert-css-attributes-to-individual-html-attributes – Craicerjack

@yoshiserry代碼對我運行良好，你使用的是什麼版本的Python？ –

2.7？我應該安裝lxml解析器嗎？也許？ – yoshiserry

嘗試使用requests：

import requests 
from bs4 import BeautifulSoup 
#url 
url = 'https://www.google.com.au/search?q=python' 
r=requests.get(url) 
html = r.text 
print html 
soup = BeautifulSoup(html) 
print(soup.prettify())

來源

2014-05-18 23:06:16

非常感謝它的工作原理。我很困惑，爲什麼beautifulsoup不起作用。 – yoshiserry

BeautifulSoup不工作如上？ –

它確實使用了請求，但它並不是美麗的;它自己的，它吐出顯然是我下載的垃圾字符。 – yoshiserry

beautifulsoup解析錯誤 - 垃圾字符

回答

相關問題