如何通過URL

我有一個列表保存在一個CSV文件的subtituting一部分解析數據，我想從網站下載一些數據，如：如何通過URL

http://www.ncbi.nlm.nih.gov/pubmed/23626827

我會喜歡編寫一個python腳本來打開url，以及從存儲數字列表的csv文件（這是在pubmed /之後）下載所有數據到另一個csv文件。

所以我必須使用urllib2，循環和字符串，但我只是不能正確理解它。

我不是要求一個完整的腳本，只是請幫助我開始它，或給出一個想法。

非常感謝！

來源

2013-04-30 Viki

嘗試更清楚地解釋吧。 csv文件中的數字代表什麼？你想下載哪些數據？什麼是「被囚禁」？ – 2013-04-30 10:19:33

當你發佈你的代碼並解釋它不是在做你期望的事情時，這個網站的效果最好。 – JosefAssad 2013-04-30 10:26:30

對不起。 Pubmed是一個包含文章的網站。數字是文章的ID。我想下載整個內容。 – Viki 2013-04-30 10:29:40

下面是如何通過csv讀取與編號（ID）輸入CSV，經由urllib2加載內容，經由lxml解析的內容和寫入到輸出csv文件的簡單示例：

import urllib2 
import csv 
import lxml.html 


URL = "http://www.ncbi.nlm.nih.gov/pubmed/" 

# read IDs from the input csv file 
with open('input.csv', 'r') as csvfile: 
    numbers = [row[0] for row in csv.reader(csvfile)] 

# get the article and collect it's title for each ID 
output = [] 
for number in numbers: 
    response = urllib2.urlopen(URL + number) 
    html = response.read() 

    tree = lxml.html.document_fromstring(html) 
    output.append(tree.xpath('//div[@class="rprt abstract"]/h1')[0].text) 

# write article titles to the output csv 
with open('output.csv', 'w') as csvfile: 
    csvwriter = csv.writer(csvfile) 
    for row in output: 
     csvwriter.writerow([row])

input.csv內容：

23626827 
23626828 
23626829

您將在output.csv的每一行中獲得文章標題。

希望有所幫助。

來源

2013-04-30 10:33:27 alecxe

是的！非常感謝你 – Viki 2013-04-30 10:40:26

不客氣。考慮接受答案，如果它有幫助。 – alecxe 2013-04-30 10:41:21

讀取csv文件，並得到數據：

f=open('yourCSV.csv','rb') 
try: 
    reader = csv.reader(f) 
    for row in reader: 
     print row[0] 
finally: 
    f.close()

然後將其添加到URL

來源

2013-04-30 10:35:49

回答

相關問題