問題與BS4颳去網站

通常我可以編寫一個腳本，用於抓取，但我一直在抓這個網站的表格爲我正在研究這個研究項目。我打算在輸入我的目標狀態的URL之前驗證在一個國家工作的腳本。問題與BS4颳去網站

import requests 
import bs4 as bs 

url = ("http://programs.dsireusa.org/system/program/detail/284") 
dsire_get = requests.get(url) 
soup = bs.BeautifulSoup(dsire_get.text,'lxml') 
table = soup.findAll('div', {'data-ng-controller': 'DetailsPageCtrl'}) 
print(table) 
#I'm printing "Table" just to ensure that the table information I'm looking for is within this sections

我不知道，如果該網站試圖從刮阻止的人，但所有我正在尋找搶的信息是「QUOT &」內，如果你的樣子表輸出。

來源

2017-07-06 vlepore

你試過'html.parser'而不是'lxml'嗎？ – martinB0103

你想要哪個頁面的哪一部分？以「計劃概述」爲主題的部分？還是那個以「權威」爲主的？或者是其他東西？ –

@BillBell我正在尋找「程序概述」 – vlepore

所以，我終於成功地解決了這個問題，並successfuly如下爲我工作從JavaScript頁面代碼獲取數據，如果任何人試圖在遇到相同的問題使用Python來刮取一個JavaScript網頁使用Windows（dryscrape不兼容）。

import bs4 as bs 
from selenium import webdriver 
from selenium.common.exceptions import NoSuchElementException 
from selenium.webdriver.common.keys import Keys 
browser = webdriver.Chrome() 
url = ("http://programs.dsireusa.org/system/program/detail/284") 
browser.get(url) 
html_source = browser.page_source 
browser.quit() 
soup = bs.BeautifulSoup(html_source, "html.parser") 
table = soup.find('div', {'class': 'programOverview'}) 
data = [] 
for n in table.findAll("div", {"class": "ng-binding"}): 
    trip = str(n.text) 
    data.append(trip)

來源

2017-07-07 17:16:29 vlepore

該文本是用JavaScript呈現的。首先渲染dryscrape

的頁面（如果你不希望使用dryscrape看到Web-scraping JavaScript page with Python）

然後文本可以被提取後，它已經呈現，從不同的位置，即在網頁上將它渲染到的地方。

作爲示例，此代碼將從摘要中提取HTML。

import bs4 as bs 
import dryscrape 

url = ("http://programs.dsireusa.org/system/program/detail/284") 
session = dryscrape.Session() 
session.visit(url) 
dsire_get = session.body() 
soup = bs.BeautifulSoup(dsire_get,'html.parser') 
table = soup.findAll('div', {'class': 'programSummary ng-binding'}) 
print(table[0])

輸出：

<div class="programSummary ng-binding" data-ng-bind-html="program.summary"><p> 
<strong>Eligibility and Availability</strong></p> 
<p> 
Net metering is available to all "qualifying facilities" (QFs), as defined by the federal <i>Public Utility Regulatory Policies Act of 1978</i> (PURPA), which pertains to renewable energy systems and combined heat and power systems up to 80 megawatts (MW) in capacity. There is no statewide cap on the aggregate capacity of net-metered systems.</p> 
<p> 
All utilities subject to Public ...

來源

2017-07-06 17:22:57

，雖然這看起來像它會工作，dryscrape不正式支持Windows，所以我無法使用它。我將按照在你沒有使用dryscape的情況下引用的那篇文章的方式。 – vlepore

這就是爲什麼我包含鏈接。無論您使用Dryscrape，Selenium，PyQt還是其他方法，方法都是一樣的。 –

問題與BS4颳去網站

回答

相關問題