2017-06-19 35 views
1

我是Python新手,我正在嘗試編寫一個網站刮板來獲取來自subreddits的鏈接,然後我可以稍後將其傳遞給另一個類以自動下載來自imagur的圖像。Python列表對象沒有任何屬性錯誤

在此代碼段,我只是想閱讀版(Subreddit)和HREF中刮任何imagur HTMLS,但我得到了以下錯誤:

AttributeError: 'list' object has no attribute 'timeout' 

任何想法,爲什麼這可能發生?下面是代碼:

from bs4 import BeautifulSoup 
from urllib2 import urlopen 
import sys 
from urlparse import urljoin 

def get_category_links(base_url): 
    url = base_url 
    html = urlopen(url) 
    soup = BeautifulSoup(html) 
    posts = soup('a',{'class':'title may-blank loggedin outbound'}) 
    #get the links with the class "title may-blank " 
    #which is how reddit defines posts 
    for post in posts: 
     print post.contents[0] 
     #print the post's title 

     if post['href'][:4] =='http': 
      print post['href'] 
     else: 
      print urljoin(url,post['href']) 
     #print the url. 
     #if the url is a relative url, 
     #print the absolute url. 


get_category_links(sys.argv) 
+1

發佈完整的回溯或提及行號。 –

+1

你在urlopen上使用'.read()'嗎? –

+2

請發佈完整的錯誤消息,包括回溯。該錯誤不是由您的代碼直接導致的,而是來自您正在使用的某個庫。 – kindall

回答

4

看看如何調用該函數:

get_category_links(sys.argv) 

sys.argv這裏是的腳本參數其中第一個項目是腳本的名稱本身的列表。這意味着你的base_url參數值是導致失敗的urlopen列表:

>>> from urllib2 import urlopen 
>>> urlopen(["I am", "a list"]) 
Traceback (most recent call last): 
    File "<stdin>", line 1, in <module> 
    File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py", line 154, in urlopen 
    return opener.open(url, data, timeout) 
      │   │ │  └ <object object at 0x105e2c120> 
      │   │ └ None 
      │   └ ['I am', 'a list'] 
      └ <urllib2.OpenerDirector instance at 0x105edc638> 
    File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py", line 422, in open 
    req.timeout = timeout 
    │    └ <object object at 0x105e2c120> 
    └ ['I am', 'a list'] 
AttributeError: 'list' object has no attribute 'timeout' 

你的意思是從sys.argv獲得第二個參數,並把它傳遞給get_category_links

get_category_links(sys.argv[1]) 

有趣儘管如此,在這種情況下多麼神祕和難以理解錯誤是。這是從"url opener" works in Python 2.7的方式來的。如果在url值(第一個參數)是不是一個字符串,它假定它是一個Request實例,並嘗試設置就可以了timeout值:

def open(self, fullurl, data=None, timeout=socket._GLOBAL_DEFAULT_TIMEOUT): 
    # accept a URL or a Request object 
    if isinstance(fullurl, basestring): 
     req = Request(fullurl, data) 
    else: 
     req = fullurl 
     if data is not None: 
      req.add_data(data) 

    req.timeout = timeout # <-- FAILS HERE 

注意,behavior have not actually changed in the latest stable 3.6 as well

+0

你能分享一下如何在你的回答中打印出像你的那樣美麗的痕跡嗎?謝謝。 – zhenguoli

+0

@zhenguoli當然,這是['better-exceptions'](https://github.com/Qix-/better-exceptions)項目,非常酷且方便。謝謝。 – alecxe

+0

非常感謝。你真好。 – zhenguoli

相關問題