2015-10-19 78 views
0

我想收集一堆使用xpath的鏈接,它需要從下一頁中抓取,但是,我不斷收到錯誤,只能解析字符串?我嘗試着看看lk的類型,並且在我鑄造它之後它是一個字符串?什麼似乎是錯的?ValueError:只能解析字符串python

def unicode_to_string(types): 
    try: 
     types = unicodedata.normalize("NFKD", types).encode('ascii', 'ignore') 
     return types 
    except: 
     return types 

def getData(): 
    req = "http://analytical360.com/access-points" 
    page = urllib2.urlopen(req) 
    tree = etree.HTML(page.read()) 
    i = 0 
    for lk in tree.xpath('//a[@class="sabai-file sabai-file-image sabai-file-type-jpg "]//@href'): 
     print "Scraping Vendor #" + str(i) 
     trees = etree.HTML(urllib2.urlopen(unicode_to_string(lk))) 
     for ll in trees.xpath('//table[@id="archived"]//tr//td//a//@href'): 
     final = etree.HTML(urllib2.urlopen(unicode_to_string(ll))) 
+1

你可以發佈完整的追溯? – jgritty

+1

在一個部分你有'page = urllib2.urlopen(req); etree.HTML(page.read())'在下一個部分中有'etree.HTML(urllib2.urlopen(unicode_to_string(ll)))'丟失urlopen返回對象上的'.read()'。 – TessellatingHeckler

+1

你需要傳遞一個不是urllib2.urlopen對象的字符串給'unicode_to_string' –

回答

1

你應該傳遞字符串而不是urllib2.orlopen。

可能更改代碼,如下所示:

trees = etree.HTML(urllib2.urlopen(unicode_to_string(lk)).read()) 
    for i, ll in enumerate(trees.xpath('//table[@id="archived"]//tr//td//a//@href')): 
     final = etree.HTML(urllib2.urlopen(unicode_to_string(ll)).read()) 

而且,你似乎並沒有增加i