2015-10-04 54 views
1

我一直在嘗試編碼列表price['value'],得到錯誤AttributeError: 'list' object has no attribute 'encode'。在意識到這個問題之後,我已經嘗試了很多不同的方式來在文本添加到列表之前對文本進行編碼,但都沒有奏效。 在這種情況下,如何正確使用.encode('utf-8')以便通過編碼文本而不是列表來獲得price['value']結果中的非Unicode數據?如何迭代和編碼列表文本而不是列表?

import mechanize 
from lxml import html 
import csv 
import io 
from time import sleep 

def save_products (products, writer): 

    for product in products: 

     writer.writerow([ product["title"][0].encode('utf-8') ]) 
     for price in product['prices']: 
      writer.writerow([ price["value"] ]) 

f_out = open('pcdResult.csv', 'wb') 
writer = csv.writer(f_out) 

links = ["http://purechemsdirect.com/ourprices.html/" ] 

br = mechanize.Browser() 
br.set_handle_robots(False) 
br.addheaders = [('User-agent', 'Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.0.1) Gecko/2008071615 Fedora/3.0.1-1.fc9 Firefox/3.0.1')] 


for link in links: 

    print(link) 
    r = br.open(link) 

    content = r.read() 

    products = []   
    tree = html.fromstring(content)   
    product_nodes = tree.xpath('//div[@class="col-md-6 col-lg-6 col-sm-12"]') 

    for product_node in product_nodes: 

     product = {} 
     try: 
      product['title'] = product_node.xpath('.//p/strong/text()') 

     except: 
      product['title'] = "" 

     price_nodes = product_node.xpath('.//ul') 

     product['prices'] = [] 
     for price_node in price_nodes: 

      price = {} 
      try: 
       price['value'] = price_node.xpath('.//li/text()') 

      except: 
       price['value'] = "" 

      product['prices'].append(price) 
     products.append(product) 
    save_products(products, writer) 

f_out.close() 

回答

0

也許嘗試一個列表理解,因爲price['value']是一個列表。我假設price['value']內的值是字符串,而不是其他列表。如果裏面有更多的列表,那麼這個答案將不起作用。

def save_products (products, writer): 

for product in products: 

    writer.writerow([ product["title"][0].encode('utf-8') ]) 
    for price in product['prices']: 
     writer.writerow([x.encode('utf-8') for x in price['value']]) 
... 
+0

@ qwertyuio9是的,你是對的它是字符串,而不是其他列表存儲在這裏。我將代碼放在哪裏? – McLeodx

+0

@McLeodx我編輯了我的答案,讓我知道如果這不起作用 – qwertyuip9

+0

完美的工作!謝謝 – McLeodx