我有一個問題,下面的代碼BS未能得到部分ID硒檢索
import re
from lxml import html
from bs4 import BeautifulSoup as BS
from selenium import webdriver
from selenium.webdriver.firefox.firefox_binary import FirefoxBinary
import requests
import sys
import datetime
print ('start!')
print(datetime.datetime.now())
list_file = 'list2.csv'
#This should be the regular input list
url_list=["http://www.genecards.org/cgi-bin/carddisp.pl?gene=ENO3&keywords=ENO3"]
#This is an example input instead
binary = FirefoxBinary('C:/Program Files (x86)/Mozilla Firefox/firefox.exe')
#Read somewhere it could be a variable useful to supply but anyway, the program fails randomly at time with [WinError 6] Invalid Descriptor while having nothing different from when it is able to at least get the webpage; even when not able to perform further operation.
for page in url_list:
print(page)
browser = webdriver.Firefox(firefox_binary=binary)
#I tried this too to solve the [WinError 6] but it is not working
browser.get(page)
print ("TEST BEGINS")
soup=BS(browser.page_source,"lxml")
soup=soup.find("summaries")
# This fails here. It finds nothing, while there is a section id termed summaries. soup.find_all("p") works but i don't want all the p's outside of summaries
print(soup) #It prints "None" indeed.
print ("TEST ENDS")
我正源代碼包含「摘要」。首先出現的是
<li> <a href="#summaries" ng-click="scrollTo('summaries')">Summaries</a></li>
再有就是
<section id="summaries" data-ga-label="Summaries" data-section="Summaries">
如這裏(Webscraping in python: BS, selenium, and None error)由@alexce建議,我試圖
summary = soup.find('section', attrs={'id':'summaries'})
(編輯:建議是_summaries但我沒有測試摘要也是)
但它也不起作用。 所以我的問題是: 爲什麼BS找不到摘要,並且爲什麼硒不斷打破,當我使用的腳本連續過多(重新啓動控制檯的作品,而另一方面,但這是乏味),或一個包含四個以上實例的列表? 感謝
我測試過許多解決方案提出[這裏](http://stackoverflow.com/questions/2136267/beautiful-soup-and-extracting-a-div-and-its-contents-by-id)和它doesn」工作。所以,我想這與我的具體頁面做...我還試圖用其他的東西,硒(robobrowser,機械湯),但是包是在Windows下使用... –