如何解析批處理網頁？

我想從quizlet中批量導出flashcard套件/套牌的列表。而不是手動點擊菜單，導出，勾選'包含圖片'，複製，粘貼到新的空白文本文件中，保存....這將更容易編寫腳本來做到這一點。如何解析批處理網頁？

我該怎麼做？有人可以幫助我開始（我可以做其他事情，等等）。

Javascript？ JQuery的？蟒蛇？

需要解析URL的文本文件（直接鏈接到每個套牌）。例如。 https://quizlet.com/215441327/f1-u1a-making-friends-flash-cards/ https://quizlet.com/218503855/f1-u1b-making-friends-flash-cards/ 並導出。

更新：有沒有辦法點擊該「更多」按鈕（省略號點）onclick，並點擊「出口」？然後點擊複選框「INCLUDE PICTURES」。然後抓住textarea？

來源

2017-09-19 And Wan

我的偏好是python。起點請參閱下面的代碼。我正在使用BeautifulSoup包。以下例子作爲起點。

from bs4 import BeautifulSoup 
import requests 
url = "https://quizlet.com/215441327/f1-u1a-making-friends-flash-cards/" 
headers = {'User-Agent':'Mozilla/5.0'} 
page = requests.get(url) 
soup = BeautifulSoup(page.text, "html5lib")

要獲得英語單詞

for en in soup.select(".TermText.notranslate.lang-en"): 
    print(en.text.strip())

輸出：

enjoy 
cheerful 
everyone 
sporty 
sometimes 
practise 
practice 
friend 
favourite 
help

爲其他語言

for ch in soup.select(".TermText.notranslate.lang-zh-TW"): 
    print(ch.text.strip())

輸出：

請享用 
高興的 
每個人 
運動型的 
有時 
練習 
練習 
朋友 
最喜歡的 
幫助

來源

2017-09-19 15:40:10 sgetachew

謝謝，看起來不錯。有沒有辦法點擊該「更多」按鈕的onclick（省略號點），並點擊「導出」點擊？然後點擊複選框「INCLUDE PICTURES」。然後抓住textarea？ –

@AndWan看到這個鏈接https://stackoverflow.com/questions/9271365/how-to-pull-out-css-attributes-from-inline-styles-with-beautifulsoup你可以直接提取圖像 – sgetachew

您可以使用硒Python庫與網頁互動也：

from selenium import webdriver 
import os 
chromedriver = "C:\Users\pappuj\Downloads\chromedriver" 
os.environ["webdriver.chrome.driver"] = chromedriver 
driver = webdriver.Chrome(chromedriver) 
url='http://www.zoover.nl/cyprus' 
driver.get(url) 
driver.find_element_by_class_name('next').click()

來源

2017-09-19 16:13:40

如何解析批處理網頁？

回答

相關問題