用Python和Selen刮取Javascript文本

我正在嘗試從TripAdvisor餐廳中提取經緯度。該信息不醒目地顯示在網頁上，但我沒有發現它在HTML瀏覽：用Python和Selen刮取Javascript文本

我試圖用這個代碼把所有的信息：

#import libraries 
import requests 
from bs4 import BeautifulSoup 
from selenium import webdriver 
from selenium.common.exceptions import NoSuchElementException 
from selenium.webdriver.common.keys import Keys 

for i in range(0, 30, 30): 
    #need this here for when you want more than 30 
    while i <= range: 
     i = str(i) 
     #url format offsets the restaurants in increments of 30 after the oa 
     url1 = 'https://www.tripadvisor.com/Restaurants-g294217-oa' + i + '-Hong_Kong.html#EATERY_LIST_CONTENTS' 
     r1 = requests.get(url1) 
     data1 = r1.text 
     soup1 = BeautifulSoup(data1, "html.parser") 
     for link in soup1.findAll('a', {'property_title'}): 
      #print 'https://www.tripadvisor.com/Restaurant_Review-g294217-' + link.get('href') 
      restaurant_url = 'https://www.tripadvisor.com/Restaurant_Review-g294217-' + link.get('href') 
      browser = webdriver.Chrome('C:\Python27\Chromedriver\chromedriver.exe') 
      # use xpath to get to the information in the JS 
      print browser.find_element_by_xpath("""/html/body/script[22]""")

當我運行代碼時，它告訴我它無法找到該元素。也許我現在有點腦子死了，但如果有一組新的眼睛可以看看這個，讓我知道如果我做錯了這個，或者如果有不同的方式去做這件事，我就會全神貫注。

來源

2016-11-11 dtrinh

不確定你的問題，但'當我<= range：'無效時，因爲'range'是一個函數。 – Brian

謝謝，我會研究一下。如果你看看我發佈的圖片鏈接。我正在試圖在該照片中提取該信息。但是當我運行它時，它告訴我代碼無法在xpath中找到元素。 – dtrinh

沒有使用requests和BeautifulSoup包時使用的是selenium webdriver硒可以打開網頁(requests)並獲取自身內容(BeautifulSoup)點。下面是你試圖完成硒的粗略結構。

from selenium import webdriver 
from selenium.common.exceptions import NoSuchElementException 
from selenium.webdriver.common.keys import Keys 


browser = webdriver.Chrome('C:\Python27\Chromedriver\chromedriver.exe') 
for counter in range(0, 30, 30): 
    #need this here for when you want more than 30 
    while i <= counter: 
     i = str(i) 
     url1 = 'https://www.tripadvisor.com/Restaurants-g294217-oa' + i + '-Hong_Kong.html#EATERY_LIST_CONTENTS' 
     browser.get(url1) # this will redirect to webpage 
     # use xpath to get to the information in the JS 
     print browser.find_element_by_xpath("""/html/body/script[22]""")

來源

2016-11-11 19:48:00 falloutcoder

用Python和Selen刮取Javascript文本

回答

相關問題