我正在通過使用BeautifulSoup在Python中進行刮取任務,並收到一些奇怪的錯誤。它提到了我並未使用的地帶,但我猜測它可能與BSoup的流程有關?當我不使用它時,爲什麼會收到有關strip()的錯誤? (Python)
在任務中,我試圖去原始網址,找到第18個鏈接,點擊該鏈接7次,然後返回第7頁上的第18個鏈接的名稱結果。我試圖使用函數從第18個鏈接獲取href,然後調整全局變量以每次使用不同的url進行遞歸。任何關於我失蹤的建議都會很有幫助。我將列出代碼和錯誤:
from bs4 import BeautifulSoup
import urllib
import re
nameList = []
urlToUse = "http://python-data.dr-chuck.net/known_by_Basile.html"
def linkOpen():
global urlToUse
html = urllib.urlopen(urlToUse)
soup = BeautifulSoup(html, "lxml")
tags = soup("li")
count = 0
for tag in tags:
if count == 17:
tagUrl = re.findall('href="([^ ]+)"', str(tag))
nameList.append(tagUrl)
urlToUse = tagUrl
count = count + 1
else:
count = count + 1
continue
bigCount = 0
while bigCount < 9:
linkOpen()
bigCount = bigCount + 1
print nameList[8]
錯誤:
Traceback (most recent call last):
File "assignmentLinkScrape.py", line 26, in <module>
linkOpen()
File "assignmentLinkScrape.py", line 10, in linkOpen
html = urllib.urlopen(urlToUse)
File
"/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib.py", line 87, in urlopen
return opener.open(url) File
"/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib.py", line 185, in open
fullurl = unwrap(toBytes(fullurl)) File
"/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib.py", line 1075, in unwrap
url = url.strip() AttributeError: 'list' object has no attribute 'strip'
我已經改變了對re.search和仍然得到錯誤。當我使用str(標籤)時,我會遇到有關沒有帶屬性的錯誤。 'AttributeError:'_sre.SRE_Match'對象沒有屬性'strip' – McLeodx
@McLeodx're.search'不返回一個字符串,它返回一個'MatchObject'。閱讀[規範](https://docs.python.org/2/library/re.html#match-objects)。 – rrauenza