使用python從網站提取數據

我最近開始學習python，我做的第一個項目是從我兒子的教室網頁上取消更新，並向我發送通知，告知他們更新了網站。這原來是一個簡單的項目，所以我想擴展這個並創建一個腳本，它會自動檢查我們的樂透號碼是否有影響。不幸的是，我一直無法弄清楚如何從網站獲取數據。這是我昨晚的一次嘗試。使用python從網站提取數據

from bs4 import BeautifulSoup 
import urllib.request 

webpage = "http://www.masslottery.com/games/lottery/large-winningnumbers.html" 

websource = urllib.request.urlopen(webpage) 
soup = BeautifulSoup(websource.read(), "html.parser") 

span = soup.find("span", {"id": "winning_num_0"}) 
print (span) 

Output is here... 
<span id="winning_num_0"></span>

上面列出的輸出也是我看到，如果我用瀏覽器「查看源代碼」。當我用網絡瀏覽器「檢查元素」時，我可以在檢查元素面板中看到中獎號碼。不幸的是，我甚至不確定Web瀏覽器如何/在哪裏獲取數據。它是從另一個頁面或腳本在後臺加載的嗎？我認爲下面的教程會幫助我，但我無法使用類似的命令獲取數據。

http://zevross.com/blog/2014/05/16/using-the-python-library-beautifulsoup-to-extract-data-from-a-webpage-applied-to-world-cup-rankings/

任何幫助表示讚賞。感謝

來源

2016-09-15 gameoverman

如果內容是動態的，你可能需要一個基於例如Selenium的方法 - http://selenium-python.readthedocs.io/api.html – ewcz

可能的重複[Reading reading dynamic web使用python]（http://stackoverflow.com/questions/13960567/reading-dynamically-generated-web-pages-using-python） – Sandeep

從開發者控制檯檢查該頁面的功能，它從這裏動態地加載數據： http://www.masslottery.com/data/json/games/lottery/recent.json 所以你可以寫一個腳本來加載那個json格式的數據並從那裏檢查數字。比搜刮html要容易得多） – lari

如果在頁面的源代碼仔細看（我只是用curl），你可以看到這個塊

<script type="text/javascript"> 
    // <![CDATA[ 
    var dataPath = '../../'; 
    var json_filename = 'data/json/games/lottery/recent.json'; 
    var games = new Array(); 
    var sessions = new Array(); 
    // ]]> 
</script>

這recent.json伸出像突兀（其實我錯過了dataPath部分在第一）。

給人一個嘗試，我想出了這個之後：

curl http://www.masslottery.com/data/json/games/lottery/recent.json

其中，作爲拉里在評論中指出的，是方式比刮HTML更容易。這很容易，其實：

import json 
import urllib.request 
from pprint import pprint 

websource = urllib.request.urlopen('http://www.masslottery.com/data/json/games/lottery/recent.json') 
data = json.loads(websource.read().decode()) 
pprint(data)

data現在是一個字典，你可以做任何一種類似字典的東西，你想用它做。祝你好運;）

來源

2016-09-15 12:34:02

謝謝。今晚我會試試這個！ – gameoverman

爲了增加樂趣，您可以隨時使用python的隨機模塊來猜測樂透號碼，看看它會給你帶來多少錢。 –

哈哈。它不能做比辦公樂透池更糟的... – gameoverman

使用python從網站提取數據

回答

相關問題