我得到一個錯誤,當我運行此腳本:NameError:名稱 '的htmlText' 沒有定義
import urllib.request
import urllib.parse
from bs4 import BeautifulSoup
url = "http://nytimes.com,http://nytimes.com"
urls = [url] #stack of urls to scrape
visited = [url] #historic record of urls
while len(urls) >0:
try:
htmltext = urllib.request.urlopen(urls[0]).read()
except:
print(htmltext)
原素文字:
import urllib.request
import urllib.parse
from bs4 import BeautifulSoup
url = "http://nytimes.com,http://nytimes.com"
urls = [url] #stack of urls to scrape
visited = [url] #historic record of urls
while len(urls) >0:
try:
htmltext = urllib.request.urlopen(urls[0]).read()
except:
print(urls[0])
soup = BeautifulSoup(htmltext)
urls.pop(0)
print (soup.findAll('a',href=True))
錯誤:
socket.gaierror: [Errno -2] Name or service not known
urllib.error.URLError: urlopen error [Errno -2] Name or service not known
Traceback (most recent call last):
NameError: name 'htmltext' is not defined
那麼如果你把'http://nytimes.com,http:// nytimes.com'放到你的瀏覽器地址欄中會發生什麼?此外,您的標題與描述不匹配(但*當然*'htmltext'沒有在'except'情況下定義 - 您在那裏是因爲任務*失敗*)。 – jonrsharpe 2014-10-26 18:53:40
我不知道它如何可能,但現在工作,對不起 – gaia 2014-10-26 19:06:52
我明白爲什麼它的工作原理,我從「url」值中刪除了第二個地址,在連接請求期間可能發生衝突,因爲它被加倍了? – gaia 2014-10-26 20:13:36