0
我想收集一堆使用xpath的鏈接,它需要從下一頁中抓取,但是,我不斷收到錯誤,只能解析字符串?我嘗試着看看lk的類型,並且在我鑄造它之後它是一個字符串?什麼似乎是錯的?ValueError:只能解析字符串python
def unicode_to_string(types):
try:
types = unicodedata.normalize("NFKD", types).encode('ascii', 'ignore')
return types
except:
return types
def getData():
req = "http://analytical360.com/access-points"
page = urllib2.urlopen(req)
tree = etree.HTML(page.read())
i = 0
for lk in tree.xpath('//a[@class="sabai-file sabai-file-image sabai-file-type-jpg "]//@href'):
print "Scraping Vendor #" + str(i)
trees = etree.HTML(urllib2.urlopen(unicode_to_string(lk)))
for ll in trees.xpath('//table[@id="archived"]//tr//td//a//@href'):
final = etree.HTML(urllib2.urlopen(unicode_to_string(ll)))
你可以發佈完整的追溯? – jgritty
在一個部分你有'page = urllib2.urlopen(req); etree.HTML(page.read())'在下一個部分中有'etree.HTML(urllib2.urlopen(unicode_to_string(ll)))'丟失urlopen返回對象上的'.read()'。 – TessellatingHeckler
你需要傳遞一個不是urllib2.urlopen對象的字符串給'unicode_to_string' –