2017-09-03 275 views
0

[免責聲明]我已經通過了該地區的其他許多答案,但他們似乎並不適合我。將BeautifulSoup的數據導出到CSV

我想能夠導出我已經抓取的數據作爲CSV文件。

我的問題是如何編寫將數據輸出到CSV的代碼片段?從代碼

目前代碼

import requests 
from bs4 import BeautifulSoup 

url = "http://implementconsultinggroup.com/career/#/6257" 
r = requests.get(url) 

req = requests.get(url).text 
soup = BeautifulSoup(r.content) 
links = soup.find_all("a") 

for link in links: 
    if "career" in link.get("href") and 'COPENHAGEN' in link.text: 
      print "<a href='%s'>%s</a>" %(link.get("href"), link.text) 

輸出

View Position 

</a> 
<a href='/career/management-consultants-to-help-our-customers-succeed-with- 
it/'> 
Management consultants to help our customers succeed with IT 
COPENHAGEN • At Implement Consulting Group, we wish to make a difference in 
the consulting industry, because we believe that the ability to create Change 
with Impact is a precondition for success in an increasingly global and 
turbulent world. 




View Position 

</a> 
<a href='/career/management-consultants-within-process-improvement/'> 
Management consultants within process improvement 
COPENHAGEN • We are looking for consultants with profound 
experience in Six Sigma, Lean and operational 
management 

代碼我已經試過

with open('ImplementTest1.csv',"w") as csv_file: 
    writer = csv.writer(csv_file) 
    writer.writerow(["link.get", "link.text"]) 
    csv_file.close() 

輸出CSV格式

1列:URL鏈接

2列:職位描述

1列:/職業/管理諮詢顧問對求助我們,客戶-succeed - 與 - 它/

2列:管理顧問,以幫助我們的客戶提供IT成功 哥本哈根•在實施顧問集團,我們希望因爲我們認爲在影響力創造變化 的能力是在日益全球化和動盪的世界取得成功的先決條件。

+0

你必須存儲在列表中的結果。 –

+0

感謝Adam。我對Python很陌生,你能快速展示如何創建/存儲結果作爲列表嗎? –

+0

這裏是我對類似問題的回答:[extract-data-from-html-to-csv-using-beautifulsoup](https://stackoverflow.com/questions/45675705/extract-data-from-html-to- csv-using-beautifulsoup/45676970#45676970) –

回答

1

試試這個腳本,並獲得CSV輸出:

import csv ; import requests 
from bs4 import BeautifulSoup 

outfile = open('career.csv','w', newline='') 
writer = csv.writer(outfile) 
writer.writerow(["job_link", "job_desc"]) 

res = requests.get("http://implementconsultinggroup.com/career/#/6257").text 
soup = BeautifulSoup(res,"lxml") 
links = soup.find_all("a") 

for link in links: 
    if "career" in link.get("href") and 'COPENHAGEN' in link.text: 
     item_link = link.get("href").strip() 
     item_text = link.text.replace("View Position","").strip() 
     writer.writerow([item_link, item_text]) 
     print(item_link, item_text) 
outfile.close() 
+0

謝謝Shahin - 這工作正如我想要的。唯一的功能不工作它的最後一塊:outfile。關閉(): 文件「」,第7行 outfile.close() ^ 語法錯誤:無效的語法 –

+0

這是因爲我想,你使用Python 2,而我使用Python 3,我不知道,但!但是,它的運行完美無瑕。 – SIM