Python打開並逐個解析URL的a.txt

這是一個簡單而基本的問題，我想。但我沒有設法找到一個清晰而簡單的答案。這裏是我的問題：Python打開並逐個解析URL的a.txt

我有一個.txt文件與每行（大約300）的網址。我從一個python腳本中獲得了這些url。我想通過一個這些URL打開一個執行這個腳本爲每一個得到一些信息，我感興趣的是：

import urllib2 
from bs4 import BeautifulSoup 
page = urllib2.urlopen("http://www.aerodromes.fr/aeroport-de-saint-martin-grand-case-ttfg-a413.html") 
soup = BeautifulSoup(page, "html.parser") 
info_tag = soup.find_all('b') 
info_nom =info_tag[2].string 
info_pos =info_tag[4].next_sibling 
info_alt =info_tag[5].next_sibling 
info_pis =info_tag[6].next_sibling 
info_vil =info_tag[7].next_sibling 
print(info_nom +","+ info_pos+","+ info_alt +","+ info_pis +","+info_vil)

aero-url.txt：

http://www.aerodromes.fr/aeroport-de-la-reunion-roland-garros-fmee-a416.html, 
http://www.aerodromes.fr/aeroport-de-saint-pierre---pierrefonds-fmep-a417.html, 
http://www.aerodromes.fr/base-aerienne-de-moussoulens-lf34-a433.html, 
http://www.aerodromes.fr/aerodrome-d-yvetot-lf7622-a469.html, 
http://www.aerodromes.fr/aerodrome-de-dieppe---saint-aubin-lfab-a1.html, 
http://www.aerodromes.fr/aeroport-de-calais---dunkerque-lfac-a2.html, 
http://www.aerodromes.fr/aerodrome-de-compiegne---margny-lfad-a3.html, 
http://www.aerodromes.fr/aerodrome-d-eu---mers---le-treport-lfae-a4.html, 
http://www.aerodromes.fr/aerodrome-de-laon---chambry-lfaf-a5.html, 
http://www.aerodromes.fr/aeroport-de-peronne---saint-quentin-lfag-a6.html, 
http://www.aerodromes.fr/aeroport-de-nangis-les-loges-lfai-a7.html, 
...

我想我必須使用循環與這樣的事情：

import urllib2 
from bs4 import BeautifulSoup 

# Open the file for reading 
infile = open("aero-url.txt", 'r') 

# Read every single line of the file into an array of lines 
lines = infile.readline().rstrip('\n\r') 

for line in infile 

page = urllib2.urlopen(lines) 
soup = BeautifulSoup(page, "html.parser") 

#find the places of each info 
info_tag = soup.find_all('b') 
info_nom =info_tag[2].string 
info_pos =info_tag[4].next_sibling 
info_alt =info_tag[5].next_sibling 
info_pis =info_tag[6].next_sibling 
info_vil =info_tag[7].next_sibling 

#Print them on the terminal. 
print(info_nom +","+ info_pos+","+ info_alt +","+ info_pis +","+info_vil)

我會寫這些結果後的txt文件。但我的問題在於如何將我的分析腳本應用到我的URL文本文件。

來源

2017-01-03 Befup

'lines'不是行列表。正如你無論如何似乎打算循環在'infile'中的每一行，我相信''lines'不是必需的。此外，您還缺少一些縮進等等。 –

使用line代替lines中的urlopen

page = urllib2.urlopen(line)

，因爲你是在循環使用infile，你不需要lines線

lines = infile.readline().rstrip('\n\r')

也縮進是錯誤的循環。
糾正這些你的代碼應該如下所示。

import urllib2 
from bs4 import BeautifulSoup 

# Open the file for reading 
infile = open("aero-url.txt", 'r') 

for line in infile: 

    page = urllib2.urlopen(line) 
    soup = BeautifulSoup(page, "html.parser") 

    #find the places of each info 
    info_tag = soup.find_all('b') 
    info_nom =info_tag[2].string 
    info_pos =info_tag[4].next_sibling 
    info_alt =info_tag[5].next_sibling 
    info_pis =info_tag[6].next_sibling 
    info_vil =info_tag[7].next_sibling 

    #Print them on the terminal. 
    print(info_nom +","+ info_pos+","+ info_alt +","+ info_pis +","+info_vil)

來源

2017-01-03 09:41:07 Anbarasan

嗨安巴拉桑，謝謝你的回答。不幸的是，我有一個語法錯誤'for infile line'，我會嘗試找出有什麼問題。如果你有一個想法，我會接受它！ – Befup

應該在for行末尾的冒號缺失。我編輯了包含它的答案。 – Anbarasan

Python打開並逐個解析URL的a.txt

回答

相關問題