我決定讓這個小項目學習如何使用機械化。現在它轉到urbandictionary,在搜索表單中填寫單詞「skid」,然後按提交併打印出HTML。如何顯示網站上的句子?
我想要做的是找到第一個定義並打印出來。我會如何去做那件事?
這是我的源代碼至今:
import mechanize
br = mechanize.Browser()
page = br.open("http://www.urbandictionary.com/")
br.select_form(nr=0)
br["term"] = "skid"
br.submit()
print br.response().read()
這裏是哪裏定義的存儲:
<div class="definition">Canadian definition: Commonly used to refer to someone who stopped evolving, and bathing, during the 80's hair band era. Generally can be found wearing AC/DC muscle shirts, leather jackets, and sporting a <a href="/define.php?term=mullet">mullet</a>. The term "skid" is in part derived from "skid row", which is both a band enjoyed by those the term refers to, as well as their address. See also <a href="/define.php?term=white%20trash">white trash</a> and <a href="/define.php?term=trailer%20park%20trash">trailer park trash</a></div><div class="example">The skid next door got drunk and beat up his old lady.</div>
你可以看到它的存儲在div定義中。我知道如何在源代碼中搜索div,但我不知道如何處理標籤之間的所有內容,然後顯示它。
我不熟悉與機械化但無論如何...我首先想到的是XPath的(LXML)或beautifulsoup – Sheena
查找到[Scrapy(http://scrapy.org/)和[BeautifulSoup(HTTP ://www.crummy.com/software/BeautifulSoup/)爲此類任務。如果網站提供了API,那可能是最好的選擇。例如,Urban Dictionary似乎有一個JSON API,但不是任何人都可以免費獲得。 –
歡迎來到StackOverflow!請查看FAQ,它會幫助我們幫助你。通常你不需要一個請求或謝謝,你的upvote就是一個衡量標準。確保你接受一個答案,如果它解決了你的問題。 – Hooked