美麗的湯 - 複姓關鍵字，錯誤::關鍵字不能我使用硒，然後美麗的湯嘗試的刮網頁中的表達

，頁面使用JavaScript加載某些內容。硒已經給了我簡單的HTML，我已經檢查了這一點，使用印刷，發現它確實包含我試圖刮的部分。但我的問題是美麗的湯。美麗的湯 - 複姓關鍵字，錯誤::關鍵字不能我使用硒，然後美麗的湯嘗試的刮網頁中的表達

我想

class="comment-detail"

找到div標籤我使用

comments = soup.find_all("div", class_="comment-detail")

但這返回空嘗試，也許是因爲實際的div標籤也有他們

data-selenium="reviews-comments"

在HTML中的確切標記是

<div data-selenium="reviews-comments" class="comment-detail">

所以我嘗試以下，

comments = soup.find_all("div", data-selenium="reviews-comments", class_="comment-detail")

但是這給了錯誤

SyntaxError: keyword can't be an expression

因爲

data-selenium

就像是一個減法操作時，它實際上只是一個複姓字。我試圖用引號括起來，但這並沒有幫助。

香港專業教育學院還試圖

dct = { 
    'div': '', 
    'data-selenium': 'reviews-comments', 
    'class': 'comment-detail' 

} 
comments = soup.find_all(**dct)

但

len(comments)

返回零，即評價是空的。

爲清楚起見，讓我的湯我的代碼

from selenium import webdriver 
from selenium.common.exceptions import NoSuchElementException 
from selenium.webdriver.common.keys import Keys 
from bs4 import BeautifulSoup 

browser = webdriver.Firefox() 
browser.get('http://www.agoda.com/the-coast-resort-koh-phangan/hotel/koh-phangan-th.html/') 
html_source = browser.page_source 
browser.quit() 

soup = BeautifulSoup(html_source,'html.parser')

任何想法如何這裏着手？

來源

2016-05-28 Runner Bean

問題從URL莖，你必須在它返回一個404頁面，而不是你真正想要的頁面末尾一個額外的斜槓。只要刪除它，你的代碼就可以正常工作。

這是我萬一使用的代碼：

from selenium import webdriver 
from selenium.common.exceptions import NoSuchElementException 
from selenium.webdriver.common.keys import Keys 
from bs4 import BeautifulSoup 

browser = webdriver.Firefox() 
browser.get('http://www.agoda.com/the-coast-resort-koh-phangan/hotel/koh-phangan-th.html') 
html_source = browser.page_source 
browser.quit() 

soup = BeautifulSoup(html_source, 'html.parser') 

comments = soup.find_all("div", class_="comment-detail") 

print(comments)

來源

2016-05-28 07:16:43 bmcculley

太謝謝你了！ –

美麗的湯 - 複姓關鍵字，錯誤::關鍵字不能我使用硒，然後美麗的湯嘗試的刮網頁中的表達

回答

相關問題