2016-10-01 95 views
2

我試圖保存所有的數據(即所有頁面)在單個csv文件,但此代碼只保存最終頁面data.Eg在這裏url []包含2個網址。最終的csv只包含第二個url數據。 我很清楚在循環中做了什麼錯誤,但我不知道是什麼。 此頁面還包含100個數據點。但是這個代碼只寫了第44行。 請幫助這個問題.............python webscraping和寫入數據到csv

from bs4 import BeautifulSoup 
import requests 
import csv 
url = ["http://sfbay.craigslist.org/search/sfc/npo","http://sfbay.craigslist.org/search/sfc/npo?s=100"] 
for ur in url: 
    r = requests.get(ur) 
    soup = BeautifulSoup(r.content) 
    g_data = soup.find_all("a", {"class": "hdrlnk"}) 
    gen_list=[] 
    for row in g_data: 
     try: 
      name = row.text 
     except: 
      name='' 
     try: 
      link = "http://sfbay.craigslist.org"+row.get("href") 
     except: 
      link='' 
     gen=[name,link] 
     gen_list.append(gen) 

with open ('filename2.csv','wb') as file: 
    writer=csv.writer(file) 
    for row in gen_list: 
     writer.writerow(row) 

回答

3

的gen_list被你的循環運行在網址中再次初始化。

gen_list=[] 

將此行移至for循環之外。

... 
url = ["http://sfbay.craigslist.org/search/sfc/npo","http://sfbay.craigslist.org/search/sfc/npo?s=100"] 
gen_list=[] 
for ur in url: 
... 
+0

太感謝你了..... – Arunkumar

0

,我發現您的文章後,想試試這個方法:

import requests 
from bs4 import BeautifulSoup 
import csv 

final_data = [] 
url = "https://sfbay.craigslist.org/search/sss" 
r = requests.get(url) 
data = r.text 

soup = BeautifulSoup(data, "html.parser") 
get_details = soup.find_all(class_="result-row") 

for details in get_details: 
    getclass = details.find_all(class_="hdrlnk") 
    for link in getclass: 
     link1 = link.get("href") 
     sublist = [] 
     sublist.append(link1) 
     final_data.append(sublist) 
print(final_data) 

filename = "sfbay.csv" 
with open("./"+filename, "w") as csvfile: 
    csvfile = csv.writer(csvfile, delimiter = ",") 
    csvfile.writerow("") 
    for i in range(0, len(final_data)): 
     csvfile.writerow(final_data[i])