當我不使用它時，爲什麼會收到有關strip（）的錯誤？（Python）

我正在通過使用BeautifulSoup在Python中進行刮取任務，並收到一些奇怪的錯誤。它提到了我並未使用的地帶，但我猜測它可能與BSoup的流程有關？當我不使用它時，爲什麼會收到有關strip（）的錯誤？（Python）

在任務中，我試圖去原始網址，找到第18個鏈接，點擊該鏈接7次，然後返回第7頁上的第18個鏈接的名稱結果。我試圖使用函數從第18個鏈接獲取href，然後調整全局變量以每次使用不同的url進行遞歸。任何關於我失蹤的建議都會很有幫助。我將列出代碼和錯誤：

from bs4 import BeautifulSoup 
import urllib 
import re 

nameList = [] 
urlToUse = "http://python-data.dr-chuck.net/known_by_Basile.html" 

def linkOpen(): 
    global urlToUse 
    html = urllib.urlopen(urlToUse) 
    soup = BeautifulSoup(html, "lxml") 
    tags = soup("li") 
    count = 0 
    for tag in tags: 
     if count == 17: 
      tagUrl = re.findall('href="([^ ]+)"', str(tag)) 
      nameList.append(tagUrl) 
      urlToUse = tagUrl 
      count = count + 1 
     else: 
      count = count + 1 
      continue 

bigCount = 0 
while bigCount < 9: 
    linkOpen() 
    bigCount = bigCount + 1 

print nameList[8]

錯誤：

Traceback (most recent call last): 
    File "assignmentLinkScrape.py", line 26, in <module> 
    linkOpen() 
    File "assignmentLinkScrape.py", line 10, in linkOpen 
    html = urllib.urlopen(urlToUse) 
    File   

"/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib.py", line 87, in urlopen 
    return opener.open(url) File 
"/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib.py", line 185, in open 
    fullurl = unwrap(toBytes(fullurl)) File 
"/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib.py", line 1075, in unwrap 
     url = url.strip() AttributeError: 'list' object has no attribute 'strip'

來源

2016-06-20 McLeodx

re.findall()回報比賽的列表。 urlToUse是一個列表，並且您試圖將它傳遞給urlopen()，而這個列表需要一個URL字符串。

來源

2016-06-20 00:47:40 alecxe

我已經改變了對re.search和仍然得到錯誤。當我使用str（標籤）時，我會遇到有關沒有帶屬性的錯誤。 'AttributeError：'_sre.SRE_Match'對象沒有屬性'strip' – McLeodx

@McLeodx're.search'不返回一個字符串，它返回一個'MatchObject'。閱讀[規範]（https://docs.python.org/2/library/re.html#match-objects）。 – rrauenza

Alexce解釋你的錯誤，但你並不需要一個正則表達式的一切，你只想得到18 L1標籤和提取裏面的定位代碼中href，您可以使用找到與find_all：

from bs4 import BeautifulSoup 
import requests 

soup = BeautifulSoup(requests.get("http://python-data.dr-chuck.net/known_by_Basile.html").content,"lxml") 

url = soup.find("ul").find_all("li", limit=18)[-1].a["href"]

或者使用CSS選擇器：

url = soup.select_one("ul li:nth-of-type(18) a")["href"]

所以訪問URL七次後得到的名稱，把邏輯的功能，請訪問網址INTIAL然後訪問和提取錨七次，然後在最後一次訪問剛剛提取的錨文本：

from bs4 import BeautifulSoup 
import requests 

soup = BeautifulSoup(requests.get("http://python-data.dr-chuck.net/known_by_Basile.html").content,"lxml") 

def get_nth(n, soup): 
    return soup.select_one("ul li:nth-of-type({}) a".format(n)) 

start = get_nth(18, soup) 
for _ in range(7): 
    soup = BeautifulSoup(requests.get(start["href"]).content,"html.parser") 
    start = get_nth(18, soup) 
print(start.text)

來源

2016-06-20 09:01:53

當我不使用它時，爲什麼會收到有關strip（）的錯誤？ （Python）

回答

相關問題

當我不使用它時，爲什麼會收到有關strip（）的錯誤？（Python）