2013-07-19 48 views
0

預期字符串或緩衝區我有這個簡單的代碼:獲取類型錯誤:在Python

#usr/bin/python 

from bs4 import BeautifulSoup 
import requests 
import tldextract 

def scrap(url): 
    main_domain = tldextract.extract(url) 
    r = requests.get(url) 
    data = r.text 
    soup = BeautifulSoup(data) 
    list = [] 
    for href in soup.find_all('a'): 
    link_domain = tldextract.extract(href.get('href')) 
    print link_domain 
    print 

獲取誤差:

Traceback (most recent call last): 
File "cloud.py", line 20, in <module> 
scrap("--- url here -- ") 
File "cloud.py", line 14, in scrap 
link_domain = tldextract.extract(href.get('href')) 
File "/usr/lib/python2.6/site-packages/tldextract/tldextract.py", line 196, in extract 
return TLD_EXTRACTOR(url) 
File "/usr/lib/python2.6/site-packages/tldextract/tldextract.py", line 127, in __call__ 
netloc = SCHEME_RE.sub("", url) \ 

TypeError: expected string or buffer 

我怎樣才能解決這個問題。

+3

粘貼完整回溯? –

+0

文件「/usr/lib/python2.6/site-packages/tldextract/tldextract.py」,行196,摘錄 return TLD_EXTRACTOR(url) 文件「/usr/lib/python2.6/site-packages/ tldextract/tldextract.py」,第127行,在__call__ netloc = SCHEME_RE.sub( 「」,URL)\ \t 類型錯誤:預期的字符串或緩衝區 – Alisha

回答

0

你的一些a標籤做href屬性,因此.get('href')回報None

用途:

link_domain = tldextract.extract(href.get('href', '')) 

在這種情況下,返回一個空字符串,或用於測試的屬性第一:

href = href.get('href') 
if not href: 
    continue 

link_domain = tldextract.extract(href) 
+0

感謝您的幫助:) 它的工作! – Alisha