使用Beautifulsoup和Selen從某個網頁獲取鏈接

我編寫了此代碼以登錄到我的FB帳戶，並使用Selenuim和BeautifulSoup獲取頁面上的所有組鏈接，但BeautifulSoup用法無法正常工作。使用Beautifulsoup和Selen從某個網頁獲取鏈接

我想知道如何在相同的代碼中使用Selenuim和BeautifulSoup。

我不想使用Facebook API;我想使用Selenium和BeautifulSoup。

from selenium import webdriver 
from selenium.webdriver.common.keys import Keys 
from selenium.webdriver.common.by import By 
import httplib2 
from BeautifulSoup import BeautifulSoup, SoupStrainer 


usr = raw_input('--> ') 
pwd = raw_input('--> ') 
poo = raw_input('--> ') 

driver = webdriver.Firefox() 
# or you can use Chrome(executable_path="/usr/bin/chromedriver") 
driver.get("https://www.facebook.com/groups/?category=membership") 
assert "Facebook" in driver.title 
elem = driver.find_element_by_id("email") 
elem.send_keys(usr) 
elem = driver.find_element_by_id("pass") 
elem.send_keys(pwd) 
elem.send_keys(Keys.RETURN) 

scheight = .1 
while scheight < 9.9: 
    driver.execute_script("window.scrollTo(0, document.body.scrollHeight/%s);" % scheight) 
    scheight += .01 
soup = BeautifulSoup(html) 
http = httplib2.Http() 
status, response = ('https://www.facebook.com/groups/?category=membership') 

count = 0 
for link in BeautifulSoup(response, parseOnlyThese=SoupStrainer('a')): 
    count = count + 1 
print 'Count: ', count 

for tag in BeautifulSoup(('a')): 
    if link.has_key('href'): 
     if '/groups/' in link['href']: 

      print link['href'] 


elem = driver.find_element_by_css_selector(".input.textInput") 
elem.send_keys(poo) 
elem = driver.find_element_by_css_selector(".selected") 
elem.send_keys(Keys.RETURN) 
elem.click() 
time.sleep(5)

來源

2015-03-18 elsharkawey

鏈接，您需要澄清。「*美麗的湯不適合工作*」的含義是什麼？ - 發生了什麼，這與預期的行爲有什麼不同？ – Celeo 2015-03-18 20:51:52

的resualt回溯（最近通話最後一個）：文件「tk.py」 28行，在湯= BeautifulSoup（HTML） NameError：名字 'HTML' 沒有定義 – elsharkawey 2015-03-18 20:53:21

您從未聲明過html。

硒的webdriver的有page_source方法，您可以使用：

soup = BeautifulSoup(driver.page_source)

更新第二個錯誤

你行，

status, response = ('https://www.facebook.com/groups/?category=membership')

試圖分配一個字符串都status和response。沒有什麼可分配給response，因此該變量未定義。

來源

2015-03-18 21:12:53 Celeo

回溯（最近最後一次通話）：文件「tk.py」，第33行，在 for BeautifulSoup中的鏈接（response，parseOnlyThese = SoupStrainer（'a'））： NameError：未定義名稱'響應' – elsharkawey 2015-03-18 21:40:00

更新爲第二個錯誤。你是從某處複製這段代碼還是正在寫它？ – Celeo 2015-03-18 21:49:19

我寫了一些此代碼，並從谷歌 – elsharkawey 2015-03-18 21:51:25

我想BeautifulSoup沒有返回正確的鏈接？

我確實覺得在BeautifulSoup與

soup = BeautifulSoup(html) 
for i in soup.find_all('a'): 
if '/groups/' in i.get('href'): 
    print(i.get('href'))

來源

2015-03-21 00:48:08 Jajo

soup = BeautifulSoup（html） NameError：名稱'html'未定義 – elsharkawey 2015-03-21 11:45:22

然後在您的代碼中，您從未定義過html ... – Jajo 2015-03-21 12:13:52

我如何定義html？ – elsharkawey 2015-03-21 12:24:26

使用Beautifulsoup和Selen從某個網頁獲取鏈接

回答

相關問題