python解析evernote共享筆記本

我想從evernote'共享筆記本'中獲取數據。例如，從這個：https://www.evernote.com/pub/missrspink/evernoteexamples#st=p&n=56b67555-158e-4d10-96e2-3b2c57ee372c python解析evernote共享筆記本

我試圖用美麗的湯：

url = 'https://www.evernote.com/pub/missrspink/evernoteexamples#st=p&n=56b67555-158e-4d10-96e2-3b2c57ee372c' 
r = requests.get(url) 
bs = BeautifulSoup(r.text, 'html.parser') 
bs

的結果不包含從筆記本電腦的任何文本信息，只有一些代碼。

我也看到了使用硒和XPath查找元素的建議。例如，我想找到本說明的主題 - 'Term 3 Week2'。在谷歌瀏覽器中，我發現它的XPath是'/ html/body/div [1]/div [1]/b/span/u/b'。所以我想這：

driver = webdriver.PhantomJS() 
driver.get(url) 
t = driver.find_element_by_xpath('/html/body/div[1]/div[1]/b/span/u/b')

但它也沒有工作，其結果是「NoSuchElementException異常：......」。

我是一個Python新手，尤其是解析，所以我很樂意接受任何幫助。我正在使用python 3.6.2和jupiter-notebook。

在此先感謝。

來源

2017-10-06 I. Petrov

要添加到什麼@blakev說，你不會得到你想要的請求，因爲正確的HTML URL中的「＃」表示之後沒有發送到服務器，因此您只需發送並將響應回覆到您需要的Selenium的「https：// www.evernote.com/pub/missrspink/evernoteexamples」 – AceLewis

與Evernote接口的最簡單方法是使用它們的official Python API。

配置好API密鑰後，通常可以連接，然後可以下載並引用Notes和Notebooks。

Evernote註釋使用自己的模板語言ENML（EverNote標記語言），它是HTML的一個子集。您將能夠使用BeautifulSoup4來解析ENML並提取您要查找的元素。

如果您試圖根據本地安裝（而不是他們的Web應用程序）提取信息，那麼您也可以從可執行文件獲取所需內容。請參閱how to pass arguments以本地安裝來提取數據。爲此，您將需要使用Python3 subprocess模塊。

無論其

如果你想使用硒，這將讓你開始：

import selenium.webdriver.support.ui as ui 
from selenium.webdriver import Chrome 
from selenium.webdriver.common.by import By 
from selenium.webdriver.support import expected_conditions as EC 

# your example URL 
URL = 'https://www.evernote.com/pub/missrspink/evernoteexamples#st=p&n=56b67555-158e-4d10-96e2-3b2c57ee372c' 

# create the browser interface, and a generic "wait" that we can use 
# to intelligently block while the driver looks for elements we expect. 
# 10: maximum wait in seconds 
# 0.5: polling interval in seconds 
driver = Chrome() 
wait = ui.WebDriverWait(driver, 10, 0.5) 

driver.get(URL) 

# Note contents are loaded in an iFrame element 
find_iframe = By.CSS_SELECTOR, 'iframe.gwt-Frame' 
find_html = By.TAG_NAME, 'html' 

# .. so we have to wait for the iframe to exist, switch our driver context 
# and then wait for that internal page to load. 
wait.until(EC.frame_to_be_available_and_switch_to_it(find_iframe)) 
wait.until(EC.visibility_of_element_located(find_html)) 

# since ENML is "just" HTML we can select the top tag and get all the 
# contents inside it. 
doc = driver.find_element_by_tag_name('html') 

print(doc.get_attribute('innerHTML')) # <-- this is what you want 

# cleanup our browser instance 
driver.quit()

來源

2017-10-06 15:52:39 blakev

確保你安裝了你要正確使用的webdriver，否則它會在'driver = Chrome（）'ste頁。 – jamescampbell

@blakev非常感謝，對於這樣一個完整的答案！「evernote」方法完全適用，「evernote」方法有一個缺點 - 沒有官方的python 3包裝，因此使用起來可能會更復雜一點。感謝您的幫助！ –

python解析evernote共享筆記本

回答

相關問題