這是一個完整的問題編輯,因爲我一定根據答案問了我的問題 - 所以我會盡量更清楚。獲取內部HTML - Selenium,BeautifulSoup,Python
我有一個對象,我試圖刮。在我的筆記本電腦上使用我的代碼,我沒有任何問題得到這個工作。當我轉移到Pythonanywhere時,我不再能夠獲得我正在尋找的信息。
,我的系統上工作的代碼是:
from urllib.request import urlopen
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.common.exceptions import NoSuchElementException
from selenium.common.exceptions import TimeoutException
from selenium.webdriver.common.keys import Keys
from bs4 import BeautifulSoup
import csv
import time
import re
#68 lines of code for another section of the site above this working well on my system and on pythonanywhere.
pageSource = driver.page_source
bsObj = BeautifulSoup(pageSource)
try:
parcel_number = bsObj.find(id="mParcelnumbersitusaddress_mParcelNumber")
s_parcel_number =parcel_number.get_text()
except AttributeError as e:
s_parcel_number = "Parcel Number not found"
# same kind of code (all working) that gets 10 more pieces of data
# Tax Year
try:
pause = WebDriverWait(driver, 20).until(EC.presence_of_element_located((By.ID, "TaxesBalancePaymentCalculator")))
taxes_owed_2015_yr = bsObj.findAll(id="mGrid_RealDataGrid")[1].findAll('tr')[1].findAll('td')[0]
except IndexError as e:
s_taxes_owed_2015_yr = "No taxes due"
此代碼工作就好了我的筆記本電腦與fireforx - 上Pythonanywhere如果我打印了我試圖刮我碰到下面的頁面pagesource在我的表應該是:
<table border="0" cellpadding="5" cellspacing="0" class="WithBorder" width="100%">
<tbody><tr>
<td id="TaxesBalancePaymentCalculator"><!--DONT_PRINT_START-->
<span class="InputFieldTitle" id="mTabGroup_Taxes_mTaxChargesBalancePaymentInjected_mReportProcessingNote">Please wait while your current taxes are calculated.</span><img src="images/progress.gif"/> <!--DONT_PRINT_FINISH--></td>
</tr> <!--DONT_PRINT_START-->
<script type="text/javascript">
function TaxesBalancePaymentCalculator_ScriptLoaded(pPageContent)
{
element('TaxesBalancePaymentCalculator').innerHTML = pPageContent;
}
function results_ready()
{
element('pay_button_area').style.display = 'block';
element('pay_button_area2').style.display = 'block';
element('pay_additional_things_area').style.display = 'block';
}
var no_taxes_calculator = '&nbsp;<' + 'span class="MessageTitle">The tax balance calculator is not availab
le.<' + '/span>';
function no_taxes_calculator_available()
{
element('TaxesBalancePaymentCalculator').innerHTML = no_taxes_calculator;
}
function invalid()
{
element('TaxesBalancePaymentCalculator').innerHTML = no_taxes_calculator;
}
loadScript('injected/TaxesBalancePaymentCalculator.aspx?parcel_number=15-720-01-01-00-0-00-000');
</script><script id="injected_taxesbalancepaymentcalculator_ScriptTag" type="text/javascript"></script>
<tr id="pay_button_area" style="DISPLAY: none">
<td id="pay_button_area2">
<table border="0" cellpadding="2" cellspacing="0">
<tbody><tr>
我打了四周,發現如果我得到的innerHTML(作爲STR):
element('TaxesBalancePaymentCalculator').innerHTML = pPageContent;
該部分對我的數據 - 問題是我不能在一個字符串瓶坯的findAll,我需要從表中的某些行:
taxes_owed_2015_yr = bsObj.findAll(id="mGrid_RealDataGrid")[1].findAll('tr')[1].findAll('td')[0]
我需要如何獲取元素作爲對象幫助(而不是字符串),以便我可以在我的數據中使用它。我嘗試了很多東西,所以我無法在這裏列出它們。我真的可以請一些幫助。
在此先感謝。
我不記得'Python'中的任何'findAll'方法。這是'bs4'方法...在代碼中輸入'bs4'?你想用'bsObj'做什麼? – Andersson
是的,它是一個bs4方法,我已經導入bs4 ---幾百行更高。我試圖從內部HTML中的表中獲取信息 - – Raymond
根據文檔,driver.get_attribute返回一個字符串,因此出現錯誤。 – Steve