2016-04-24 86 views
0

我不斷遇到乾淨的csv輸出問題。csv輸出中的空白 - Python

下面是本程序:

import csv 
import requests 
from lxml import html 

page = requests.get('http://www.mediamarkt.be/mcs/productlist/_108-tot-127-cm-43-tot-50-,98952,501090.html?langId=-17') 
tree = html.fromstring(page.content) 

outfile = open("./tv_test1.csv", "wb") 
writer = csv.writer(outfile) 

rows = tree.xpath('//*[@id="category"]/ul[2]/li') 
writer.writerow(["Product Name", "Price"]) 

for row in rows: 
    price = row.xpath('div/aside[2]/div[1]/div[1]/div/text()') 
    product_ref = row.xpath('div/div/h2/a/text()') 
    writer.writerow([product_ref,price]) 

outfile.close() 

電流輸出:

['\r\n\t\t\t\t\tTV SAMSUNG UE48JU6640UXXN 48" LCD FULL LED Smart Ultra HD Curved\r\n\t\t\t\t'],"['999,-']" 

需要的輸出:

TV SAMSUNG UE48JU6640UXXN 48" LCD FULL LED Smart Ultra HD Curve,999,- 

回答

0

發現:

import csv 
import requests 
from lxml import html 

page = 
requests.get('http://www.mediamarkt.be/mcs/productlist/_108-tot-127-cm-43-tot-50-,98952,501090.html?langId=-17') 
tree = html.fromstring(page.content) 

outfile = open("./tv_test1.csv", "wb") writer = csv.writer(outfile) 

rows = tree.xpath('//*[@id="category"]/ul[2]/li') 
writer.writerow(["Product Name", "Price"]) 

for row in rows: 
    price = row.xpath('normalize-space(div/aside[2]/div[1]/div[1]/div/text())') 
    product_ref = row.xpath('normalize-space(div/div/h2/a/text())') 
    writer.writerow([product_ref,price]) 

outfile.close() 
0

你可以簡單地刪除\n\r\t寫入CSV文件中的數據之前:

import csv 
import requests 
from lxml import html 

page = requests.get('http://www.mediamarkt.be/mcs/productlist/_108-tot-127-cm-43-tot-50-,98952,501090.html?langId=-17') 
tree = html.fromstring(page.content) 

outfile = open("./tv_test1.csv", "wb") 
writer = csv.writer(outfile) 

rows = tree.xpath('//*[@id="category"]/ul[2]/li') 
writer.writerow(["Product Name", "Price"]) 

for row in rows: 
    price = row.xpath('div/aside[2]/div[1]/div[1]/div/text()') 
    for i in range(len(price)): 
     price[i]= price[i].replace("\n","") 
     price[i]= price[i].replace("\t","") 
     price[i]= price[i].replace("\r","") 

    product_ref = row.xpath('div/div/h2/a/text()') 
    for i in range(len(product_ref)): 
     product_ref[i]= product_ref[i].replace("\n","") 
     product_ref[i]= product_ref[i].replace("\t","") 
     product_ref[i]= product_ref[i].replace("\r","") 
    if len(product_ref) and len(price): 
     writer.writerow([product_ref,price]) 

outfile.close() 

,你將有:

enter image description here

請注意,我還檢查的price長度和product_ref,然後將它們存儲在文件中。

+0

'strip()'也可以工作,因爲這些字符在末尾 –

+0

@ cricket_007是的,但是使用這種方法,內部字符也可以被刪除。 – EbraHim