0
Python的IDLE 2.7無法通過網址
我試圖從得到的所有公司名稱的列表獲得的WebCrawler要循環。這是第一個URL http://app.core-apps.com/weftec2014/exhibitors/list/A
下面的代碼適用於每一個單頁,如果我手動更改URL的最後一個字母爲26倍,像http://app.core-apps.com/weftec2014/exhibitors/list/Z
import urllib2
response = urllib2.urlopen('http://app.core-apps.com/weftec2014/exhibitors/list/A')
page = response.read()
page = page[4632:]
def get_next_target(page):
start_link = page.find("<a href='/weftec2014/exhibitors/")
if start_link == -1:
return None, 0
else:
start_place = start_link+73 #to get company names after the first <div>
end_place = page.find("</div>", start_place)
item = page[start_place:end_place]
return item, end_place
def print_all_com(page): #return company names
results = []
while True:
item, end_place = get_next_target(page)
if item:
results.append([ item.strip() ])
#print item
page = page[end_place:]
else:
break
return results
data = print_all_com(page)
import csv
with open('weftec.csv','w') as f:
writer = csv.writer(f)
writer.writerows(data)
但是我想通過一個讓蟒蛇循環 - Z給我,並且一次返回所有公司名稱。 所以我想補充前面的腳本下面的另一個編碼塊:
letter = ['A','B','C','D','E','F','G','H','I','J','K','L','M','N','O','P','Q','R','S','T','U','V','W','X','Y','Z']
url = 'http://app.core-apps.com/weftec2014/exhibitors/list/'
for n in range(0, len(letter)):
target = []
url_letter = url+letter[n]
response = urllib2.urlopen(url_letter)
page = response.read()
page = page[4632:]
data = print_all_com(page)
target.append(data)
我覺得有什麼毛病上面的腳本,因爲LEN(目標)爲1,而不是公司的從A總數 - Z.
當我將結果保存到CSV文件中時,它給了我一個非常奇怪的結果,這是Z頁上的公司名稱。看到下面的確切結果。
['ZAPS Technologies, Inc'] ['Zoeller Engineered Products']
['ZAPS Technologies, Inc'] ['Zoeller Engineered Products']
我覺得出事了第二塊,但我不能真正弄明白......
謝謝。有用。我還將target.append(data)更改爲target.extend(data),這使得輸出變得更漂亮。 – Yumi
是的.append採用整個數據結構並將其附加到列表中,而擴展采用數據結構的內容並將其附加到列表的末尾。例如一個手勢[a,[b]],其他給出[a,b]。 –