PhantomJS瀏覽器不爲某些url加載javascript

我試圖下載Google趨勢數據並使用PhantomJS加載加載頁面並提取所需數據。當我使用url中的唯一關鍵字運行代碼時（例如url：https://www.google.com/trends/explore?date=today%203-m&geo=US&q=Blue），它工作正常。只要我添加第二個關鍵字（示例url：https://www.google.com/trends/explore?date=today%203-m&geo=US&q=Blue,Red）PhantomJS不再正確加載頁面，我無法找到我需要的數據。我已經嘗試增加瀏覽器等待的時間，並嘗試了一些不同的關鍵字沒有任何成功。我沒有想法，只是不明白爲什麼我的程序在稍微改變url後不再起作用（兩個url的標籤和頁面結構幾乎相同，所以問題不在於標籤不再具有與前）這裏是有問題的代碼：PhantomJS瀏覽器不爲某些url加載javascript

# Reading google trends data 
    google_trend_array = [] 
    url = 'https://www.google.com/trends/explore?date=today%203-m&geo=US&q=Blue,Red' 
    browser = webdriver.PhantomJS('...\\phantomjs-2.1.1-windows\\bin\\phantomjs.exe') 
    ran_smooth = False 
    time_to_sleep = 3 
    # ran_smooth makes sure that page has loaded and necessary code was extracted, if not it will try to load the page again 
    while ran_smooth is False: 
     browser.get(url) 
     time.sleep(time_to_sleep) 
     soup = BeautifulSoup(browser.page_source, "html.parser") # BS object to use bs4 
     table = soup.find('div', {'aria-label': 'A tabular representation of the data in the chart.'}) 
     # If page didn't load, this try will throw an exception 
     try: 
      # Copies all the data out of google trends table 
      for col in table.findAll('td'): 
       # google has both dates and trend values, the following function ensures that we only read the trend values 
       if col.string.isdigit() is True: 
        trend_number = int(col.string) 
        google_trend_array.append(trend_number) 

      # program ran through, leave while loop 
      ran_smooth = True 
     except AttributeError: 
      print 'page not loading for term ' + str(term_to_trend) + ', trying again...' 
      time_to_sleep += 1 # increase time to sleep so that page can load 
    print google_trend_array

來源

2016-10-23 Marc vT

你應該看pytrends，而不是推倒重來。

這裏是一個小例子：如何從谷歌趨勢提取數據幀：

import pytrends.request 

google_username = "<your_login>@gmail.com" 
google_password = "<your_password>" 

# connect to Google 
pytrend = pytrends.request.TrendReq(google_username, google_password, custom_useragent='My Pytrends Script') 
trend_payload = {'q': 'Pizza, Italian, Spaghetti, Breadsticks, Sausage', 'cat': '0-71'} 
# trend = pytrend.trend(trend_payload) 

df = pytrend.trend(trend_payload, return_type='dataframe')

您將獲得：

  breadsticks italian pizza sausage spaghetti 
Date              
2004-01-01   0.0  9.0 34.0  3.0  3.0 
2004-02-01   0.0  10.0 32.0  2.0  3.0 
2004-03-01   0.0  10.0 32.0  2.0  3.0 
2004-04-01   0.0  9.0 31.0  2.0  2.0 
2004-05-01   0.0  9.0 32.0  2.0  2.0 
2004-06-01   0.0  8.0 29.0  2.0  3.0 
2004-07-01   0.0  8.0 34.0  2.0  3.0 
[...]

來源

2016-10-23 15:35:28

謝謝你的回覆，我一定會嘗試。我唯一擔心的是，如果我經常使用這樣的腳本，Google會阻止我的帳戶。你有關於這個問題的經驗嗎？ –

PhantomJS瀏覽器不爲某些url加載javascript

回答

相關問題