2017-07-09 81 views
0

因此,我無法獲取url的下一頁的href鏈接。我起身去獲取所有的文字以及標籤中包含的內容,但似乎無法將我的頭部包裹起來,去掉我不需要的文字,只是獲取href並瀏覽頁面。如何獲得下一個分頁'href'?

這裏是我的代碼:

import requests 
from bs4 import BeautifulSoup 
import webbrowser 
import time 

jobsearch = input("What type of job?: ") 
location = input("What is your location: ") 
url = ("https://ca.indeed.com/jobs?q=" + jobsearch + "&l=" + location) 
base_url = 'https://ca.indeed.com/' 

r = requests.get(url) 
rcontent = r.content 
prettify = BeautifulSoup(rcontent, "html.parser") 

filter_words = ['engineering', 'instrumentation', 'QA'] 
all_job_url = [] 
nextpages = [] 
filtered_job_links = [] 
http_flinks = [] 
flinks = [] 

def all_next_pages(): 
    pages = prettify.find_all('div', {'class':'pagination'}) 
    for next_page in pages: 
     next_page.find_all('a') 
     nextpages.append(next_page) 
     print(next_page) 

all_next_pages() 

回答

1

這是一種方式來獲得搜索結果項目的鏈接。找到row result類,然後找到a標記,它包含您需要的所有信息。

import requests 
from bs4 import BeautifulSoup 
import webbrowser 
import time 

jobsearch = input("What type of job?: ") 
location = input("What is your location: ") 
url = ("https://ca.indeed.com/jobs?q=" + jobsearch + "&l=" + location) 
base_url = 'https://ca.indeed.com/' 

r = requests.get(url) 
rcontent = r.text 
prettify = BeautifulSoup(rcontent, "lxml") 

filter_words = ['engineering', 'instrumentation', 'QA'] 
all_job_url = [] 
nextpages = [] 
filtered_job_links = [] 
http_flinks = [] 
flinks = [] 

def all_next_pages(): 
    pages = prettify.find_all('div', {'class':' row result'}) 
    for next_page in pages: 
     info = next_page.find('a') 
     url = info.get('href') 
     title = info.get('title') 
     print(title,url) 

all_next_pages()