如何獲得頁面內的直接下載鏈接？

我有這樣的代碼：如何獲得頁面內的直接下載鏈接？

import urllib 
from bs4 import BeautifulSoup 

f = open('log1.txt', 'w') 

url ='http://www.brothersoft.com/tamil-font-513607.html' 
pageUrl = urllib.urlopen(url) 
soup = BeautifulSoup(pageUrl) 

for a in soup.select("div.class1.coLeft a[href]"): 
    try: 
     suburl = ('http://www.brothersoft.com'+a['href']).encode('utf-8','replace') 
     f.write ('http://www.brothersoft.com'+a['href']+'\n') 
    except: 
     print 'cannot read' 
     f.write('cannot read:'+'http://www.brothersoft.com'+a['href']+'\n') 

     pass 

    content = urllib.urlopen(suburl) 
    soup = BeautifulSoup(content) 
    for a in soup.select("div.Sever1.coLeft a[href]"): 
     try: 
      suburl2 = ('http://www.brothersoft.com'+a['href']).encode('utf-8','replace') 
      f.write ('http://www.brothersoft.com'+a['href']+'\n') 
     except: 
      print 'cannot read' 
      f.write('cannot read:'+'http://www.brothersoft.com'+a['href']+'\n') 

      pass 

     content = urllib.urlopen(suburl2) 
     soup = BeautifulSoup(content) 
     for a in soup.select("span.p a[href]"): 
      try: 
       print (a['href']).encode('utf-8','replace') 
       f.write ('http://www.brothersoft.com'+a['href']+'\n') 
      except: 
       print 'cannot read' 
       f.write('cannot read:'+'http://www.brothersoft.com'+a['href']+'\n') 

       pass 




f.close()

當我運行它，我得到這樣的結果：

http://www.brothersoft.com/d.php?soft_id=513607&url=http%3A%2F%2Ffiles.brotherso 
ft.com%2Fphotograph_graphics%2Ffont_tools%2Fkeyman.exe&name=Tamil%20Font 
http://ask.brothersoft.com/ask-a-question/?topic=1 
http://ask.brothersoft.com/ 
http://www.brothersoft.com/d.php?soft_id=513607&url=http%3A%2F%2Fusfiles.brother 
soft.com%2Fphotograph_graphics%2Ffont_tools%2Fkeyman.exe&name=Tamil%20Font 
http://ask.brothersoft.com/ask-a-question/?topic=1 
http://ask.brothersoft.com/

但我需要的只是這樣的直接下載鏈接：

http://www.brothersoft.com/d.php?soft_id=513607&url=http%3A%2F%2Ffiles.brothersoft.com%2Fphotograph_graphics%2Ffont_tools%2Fkeyman.exe&name=Tamil%20Font

來源

2013-08-29 wan mohd payed

而不是最後一個塊：

for a in soup.select("span.p a[href]"): 
     try: 
      print (a['href']).encode('utf-8','replace') 
      f.write ('http://www.brothersoft.com'+a['href']+'\n') 
     except: 
      print 'cannot read' 
      f.write('cannot read:'+'http://www.brothersoft.com'+a['href']+'\n') 

      pass

讀取來自body的onload屬性的網址：

print soup.find('body')['onload'][10:-2]

來源

2013-08-29 14:24:25 alecxe

爲什麼我有兩個下載鏈接？ http://www.brothersoft.com/d.php?soft_id=513607&url=http%3A%2F%2Ffiles.brotherso ft.com％2Fphotograph_graphics％2Ffont_tools％2Fkeyman.exe＆name = Tamil％20Font http：// www .brothersoft.com/d.php？soft_id = 513607＆url = http％3A％2F％2Fusfiles.brother soft.com％2Fphotograph_graphics％2Ffont_tools％2Fkeyman.exe＆name = Tamil％20Font –

@wanmohdpayed，因爲第二秒有兩個下載鏡像步。你可以使用'soup.find（「div.Sever1.coLeft a [href]」）'而不是循環。讓我知道你是否有問題。謝謝。 – alecxe

我得到這個錯誤：回溯（最近通話最後一個）：文件「C：\用戶\ EXT-chermo \桌面\ soup5.py」 32行，在含量=了urllib.urlopen（suburl2）文件「C：\ Python27 \ lib \ urllib.py」，第86行，在urlopen return opener.open（url）文件「C：\ Python27 \ lib \ urllib.py」，行179，打開 fullurl = unwrap （toBytes（fullurl））文件「C：\ Python27 \ lib \ urllib.py」，行1056，解包 url = url.strip（） AttributeError：'NoneType'對象沒有屬性'strip' –

如何獲得頁面內的直接下載鏈接？

回答

相關問題