2
我有這樣的代碼:如何獲得頁面內的直接下載鏈接?
import urllib
from bs4 import BeautifulSoup
f = open('log1.txt', 'w')
url ='http://www.brothersoft.com/tamil-font-513607.html'
pageUrl = urllib.urlopen(url)
soup = BeautifulSoup(pageUrl)
for a in soup.select("div.class1.coLeft a[href]"):
try:
suburl = ('http://www.brothersoft.com'+a['href']).encode('utf-8','replace')
f.write ('http://www.brothersoft.com'+a['href']+'\n')
except:
print 'cannot read'
f.write('cannot read:'+'http://www.brothersoft.com'+a['href']+'\n')
pass
content = urllib.urlopen(suburl)
soup = BeautifulSoup(content)
for a in soup.select("div.Sever1.coLeft a[href]"):
try:
suburl2 = ('http://www.brothersoft.com'+a['href']).encode('utf-8','replace')
f.write ('http://www.brothersoft.com'+a['href']+'\n')
except:
print 'cannot read'
f.write('cannot read:'+'http://www.brothersoft.com'+a['href']+'\n')
pass
content = urllib.urlopen(suburl2)
soup = BeautifulSoup(content)
for a in soup.select("span.p a[href]"):
try:
print (a['href']).encode('utf-8','replace')
f.write ('http://www.brothersoft.com'+a['href']+'\n')
except:
print 'cannot read'
f.write('cannot read:'+'http://www.brothersoft.com'+a['href']+'\n')
pass
f.close()
當我運行它,我得到這樣的結果:
http://www.brothersoft.com/d.php?soft_id=513607&url=http%3A%2F%2Ffiles.brotherso
ft.com%2Fphotograph_graphics%2Ffont_tools%2Fkeyman.exe&name=Tamil%20Font
http://ask.brothersoft.com/ask-a-question/?topic=1
http://ask.brothersoft.com/
http://www.brothersoft.com/d.php?soft_id=513607&url=http%3A%2F%2Fusfiles.brother
soft.com%2Fphotograph_graphics%2Ffont_tools%2Fkeyman.exe&name=Tamil%20Font
http://ask.brothersoft.com/ask-a-question/?topic=1
http://ask.brothersoft.com/
但我需要的只是這樣的直接下載鏈接:
爲什麼我有兩個下載鏈接? http://www.brothersoft.com/d.php?soft_id=513607&url=http%3A%2F%2Ffiles.brotherso ft.com%2Fphotograph_graphics%2Ffont_tools%2Fkeyman.exe&name = Tamil%20Font http:// www .brothersoft.com/d.php?soft_id = 513607&url = http%3A%2F%2Fusfiles.brother soft.com%2Fphotograph_graphics%2Ffont_tools%2Fkeyman.exe&name = Tamil%20Font –
@wanmohdpayed,因爲第二秒有兩個下載鏡像步。你可以使用'soup.find(「div.Sever1.coLeft a [href]」)'而不是循環。讓我知道你是否有問題。謝謝。 – alecxe
我得到這個錯誤: 回溯(最近通話最後一個): 文件 「C:\用戶\ EXT-chermo \桌面\ soup5.py」 32行,在 含量=了urllib.urlopen(suburl2) 文件「C:\ Python27 \ lib \ urllib.py」,第86行,在urlopen return opener.open(url) 文件「C:\ Python27 \ lib \ urllib.py」,行179,打開 fullurl = unwrap (toBytes(fullurl)) 文件「C:\ Python27 \ lib \ urllib.py」,行1056,解包 url = url.strip() AttributeError:'NoneType'對象沒有屬性'strip' –