將URL複製到包含特定術語的文件

因此，我試圖獲取其網頁中包含術語「食譜改編自」或「食譜來自」的範圍中的所有網址。這會複製到文件的所有鏈接，直到7496，然後它吐出HTTPError 404.我做錯了什麼？我試圖實現BeautifulSoup和請求，但我仍然無法實現它的工作。將URL複製到包含特定術語的文件

import urllib2 
with open('recipes.txt', 'w+') as f: 
    for i in range(14477): 
     url = "http://www.tastingtable.com/entry_detail/{}".format(i) 
     page_content = urllib2.urlopen(url).read() 
     if "Recipe adapted from" in page_content: 
      print url 
      f.write(url + '\n') 
     elif "Recipe from" in page_content: 
      print url 
      f.write(url + '\n') 
     else: 
      pass

來源

2013-08-06 user2656931

您試圖抓取的部分網址不存在。或許，忽略例外：

import urllib2 
with open('recipes.txt', 'w+') as f: 
    for i in range(14477): 
     url = "http://www.tastingtable.com/entry_detail/{}".format(i) 
     try: 
      page_content = urllib2.urlopen(url).read() 
     except urllib2.HTTPError as error: 
      if 400 < error.code < 500: 
       continue # not found, unauthorized, etc. 
      raise # other errors we want to know about 
     if "Recipe adapted from" in page_content or "Recipe from" in page_content: 
      print url 
      f.write(url + '\n')

來源

2013-08-06 13:28:42

將URL複製到包含特定術語的文件

回答

相關問題