2016-12-31 51 views
1

我無法打印正確的關鍵字在下面的代碼中發現的鏈接:設置變量等於行,其中關鍵字發現

import urllib2 
from random import randint 
import time 
from lxml import etree 
from time import sleep 

a = requests.get('http://properlbc.com/sitemap.xml') 
#time.sleep(1) 
scrape = BeautifulSoup(a.text, 'lxml') 
linkz = scrape.find_all('loc') 
for linke in linkz: 
    if "products" in linke.text: 
     sitemap = str(linke.text) 
     break 



while True: 
# sleep(randint(4,6)) 
    keyword1 = "properlbc" 
    keyword2 = "products" 
    keyword3 = "bb1296" 
    r = requests.get(sitemap) 
# time.sleep(1) 
    soup = BeautifulSoup(r.text, 'lxml') 
    links = soup.find_all('loc') 
    for link in links: 
     while (keyword1 in link.text and keyword2 in link.text and keyword3 in link.text): 
      continue 
     print("LINK SCRAPED") 
     print(str(link.text) + "link scraped") 
     break 

的代碼是成功的循環,直到用關鍵字鏈接被發現但它不打印帶有關鍵字的具體環節,它打印的,而不是「https://properlbc.com/collections/new-arrival/products/bb1296

+0

。 – furas

回答

1

你要做

for link in links: 
    if keyword1 in link.text and keyword2 in link.text and keyword3 in link.text: 
     print("LINK SCRAPED") 
     print(str(link.text) + "link scraped") 

最初的「link.text」甚至

for link in links: 
    text = link.text 
    if keyword1 in text and keyword2 in text and keyword3 in text: 
     print("LINK SCRAPED") 
     print(text, "link scraped") 

編輯:離開循環時,發現鏈接

keyword1 = "properlbc" 
keyword2 = "products" 
keyword3 = "bb1296" 

found = False 

while not found: 
    #sleep(randint(4,6)) 
    r = requests.get(sitemap) 
    soup = BeautifulSoup(r.text, 'lxml') 
    links = soup.find_all('loc') 
    for link in links: 
     text = link.text 
     if keyword1 in text and keyword2 in text and keyword3 in text: 
      print("LINK SCRAPED") 
      print(text, "link scraped") 
      found = True # to leave `while` loop 
      break # to leave `for` loop 
您使用`while`和`continue`跳過關鍵字鏈接,以便它不打印
+0

是的,但會循環,直到鏈接添加到網站? – ColeWorld

+0

檢查鏈接是否被添加到一邊,你必須再次閱讀頁面。僅循環鏈接是無用的。 – furas

+0

找到鏈接時可以使用'found = False'和'while not found:'而不是'while while'來退出循環。然後設置'found = True',如果關鍵字1 ...' – furas

相關問題