2016-12-07 31 views
0

我想從網址刮一些玩家數據(tr)的行,但是當我運行我的代碼時沒有任何事情發生。我很積極,我的代碼很好,因爲它可以與包含表格的其他統計網站一起使用。誰能告訴我爲什麼沒有發生什麼事?提前致謝。Python BeautifulSoup不刮這個網址

import urllib 
import urllib.request 
from bs4 import BeautifulSoup 

def make_soup(url): 
thepage = urllib.request.urlopen(url) 
soupdata = BeautifulSoup(thepage, "html.parser") 
return soupdata 

soup = make_soup("https://www.whoscored.com/Regions/252/Tournaments/7/Seasons/6365/Stages/13832/PlayerStatistics/England-Championship-2016-2017") 
for record in soup.findAll('tr'): 
    print(record.text) 

回答

0

本頁面利用javascript來獲取數據,你可以在這個環節發現的原始數據:

https://www.whoscored.com/StatisticsFeed/1/GetPlayerStatistics?category=summary&subcategory=all&statsAccumulationType=0&isCurrent=true&playerId=&teamIds=&matchId=&stageId=13832&tournamentOptions=7&sortBy=Rating&sortAscending=&age=&ageComparisonType=&appearances=&appearancesComparisonType=&field=Overall&nationality=&positionOptions=&timeOfTheGameEnd=&timeOfTheGameStart=&isMinApp=true&page=&includeZeroValues=&numberOfPlayersToPick=10 

URL的每個字段可以改變獲取你需要的數據。

0

這是因爲網站不希望你刮。

Incapsula Protection

我以前selenium發送請求和合照模擬 瀏覽器,它創造了

它使用Incapsula這是一個安全的服務(他們甚至有一些information約颳了他們的網站)- 檢查出來,它很有趣 -

  • This可能會有所幫助
1

簡短的回答:你正在尋找的球員數據該URL。

那麼你可能要問爲什麼我已經在那個頁面看到他們,他們怎麼不在那裏?

因此,我將嘗試解釋當您使用Chrome等現代瀏覽器瀏覽該網址時會發生什麼情況。

您:輸入網址並按回車。

Chrome: Gotcha。我會盡快爲你提供該頁面,只需一秒鐘。 (從該網址獲取內容),現在我擁有它了!但等待讓我 閱讀/解析它之前,我告訴你,(閱讀什麼裏面 的內容),哦廢話這個JavaScript告訴我從另一個URL獲得額外的 信息,好吧,我會做到這一點;哦,等待這裏的另一個 一個告訴我在標題中加載一個廣告,以及我不喜歡它,但我只是要做我所告訴的;只需一秒鐘,這些css告訴我以 顯示玩家的名字,用粗體顯示,還行不錯;哦,這裏的另一張照片從 網址xxx我需要加載,沒問題...哦,男人,有多少東西 我要處理?我對這個網站並不滿意......(正在致力於一堆其他的東西......)最後一切都準備好了!現在檢查出來!

你:玩家xxx其實很不錯,我會檢查一下。 (點擊玩家XXX)

鉻::......

正如你可以看到每一個時間,當你瀏覽網頁時,瀏覽器做大量的「幕後」的東西爲用戶顯示它。所以基本上:網址輸入>> url提取的內容>>解析的內容>>提取的其他內容>>提供的所有東西>>顯示的頁面(一個或多個步驟可能同時完成)

並且隨着您的代碼,它只是「從url中獲取的內容」,也是那些你想要的數據恰好是「額外的內容」,必須從其他地方加載,所以這就是爲什麼你什麼都沒有。

那麼我該如何獲得這些統計數據呢?一旦您知道負責加載這些統計數據的網址,只需追究他們。我如何找出這些網址?那麼你可以隨時閱讀javascripts ...如果你有足夠的耐心......

最簡單的方法得到你想要的是分析流量,而該頁面正在加載,並找出所有幕後交通。我會推薦fiddler,但您可以使用任何您認爲合適的工具。

現在,讓我們看看會發生什麼,當你加載頁面: traffic analytics

有實際上是在數百作出完全渲染頁面您訪問請求,和所有你需要做的是找出哪一個供稿「實際」或「真實」統計。這個網址甚至包含「StatisticsFeed」在內的內容,是否可以成爲其中一個?讓我們一起來看看:

https://www.whoscored.com/StatisticsFeed/1/GetPlayerStatistics?category=summary&subcategory=all&statsAccumulationType=0&isCurrent=true&playerId=&teamIds=&matchId=&stageId=13832&tournamentOptions=7&sortBy=Rating&sortAscending=&age=&ageComparisonType=&appearances=&appearancesComparisonType=&field=Overall&nationality=&positionOptions=&timeOfTheGameEnd=&timeOfTheGameStart=&isMinApp=true&page=&includeZeroValues=&numberOfPlayersToPick=10

{ 
    "playerTableStats": [{ 
     "name": "Conor Hourihane", 
     "firstName": "Conor", 
     "lastName": "Hourihane", 
     "playerId": 134172, 
     "height": 181, 
     "weight": 62, 
     "age": 25, 
     "isManOfTheMatch": false, 
     "isActive": true, 
     "isOpta": true, 
     "playedPositions": "-MC-", 
     "positionText": "Midfielder", 
     "playedPositionsShort": "M(C)", 
     "teamId": 142, 
     "teamName": "Barnsley", 
     "seasonId": 6365, 
     "seasonName": "2016/2017", 
     "tournamentId": 7, 
     "tournamentRegionId": 252, 
     "tournamentRegionCode": "gb-eng", 
     "regionCode": "ie", 
     "tournamentName": "Championship", 
     "tournamentShortName": "EC", 
     "rating": 7.8705882352941181, 
     "ranking": 1, 
     "apps": 17, 
     "subOn": 0, 
     "minsPlayed": 1530, 
     "manOfTheMatch": 4, 
     "yellowCard": 5.0, 
     "redCard": 0.0, 
     "goal": 3, 
     "assistTotal": 8, 
     "shotsPerGame": 2.2352941176470589, 
     "aerialWonPerGame": 0.6470588235294118, 
     "passSuccess": 81.370449678800867 
    }, 
    { 
     "name": "Anthony Knockaert", 
     "firstName": "Anthony", 
     "lastName": "Knockaert", 
     "playerId": 86794, 
     "height": 172, 
     "weight": 69, 
     "age": 25, 
     "isManOfTheMatch": false, 
     "isActive": true, 
     "isOpta": true, 
     "playedPositions": "-AML-AMR-", 
     "positionText": "Midfielder", 
     "playedPositionsShort": "AM(LR)", 
     "teamId": 211, 
     "teamName": "Brighton", 
     "seasonId": 6365, 
     "seasonName": "2016/2017", 
     "tournamentId": 7, 
     "tournamentRegionId": 252, 
     "tournamentRegionCode": "gb-eng", 
     "regionCode": "fr", 
     "tournamentName": "Championship", 
     "tournamentShortName": "EC", 
     "rating": 7.6722222222222216, 
     "ranking": 2, 
     "apps": 18, 
     "subOn": 1, 
     "minsPlayed": 1471, 
     "manOfTheMatch": 5, 
     "yellowCard": 4.0, 
     "redCard": 0.0, 
     "goal": 6, 
     "assistTotal": 0, 
     "shotsPerGame": 2.3888888888888888, 
     "aerialWonPerGame": 0.22222222222222221, 
     "passSuccess": 83.420593368237348 
    }, 
    { 
     "name": "Lewis Dunk", 
     "firstName": "Lewis", 
     "lastName": "Dunk", 
     "playerId": 86441, 
     "height": 192, 
     "weight": 88, 
     "age": 25, 
     "isManOfTheMatch": false, 
     "isActive": true, 
     "isOpta": true, 
     "playedPositions": "-DC-", 
     "positionText": "Defender", 
     "playedPositionsShort": "D(C)", 
     "teamId": 211, 
     "teamName": "Brighton", 
     "seasonId": 6365, 
     "seasonName": "2016/2017", 
     "tournamentId": 7, 
     "tournamentRegionId": 252, 
     "tournamentRegionCode": "gb-eng", 
     "regionCode": "gb-eng", 
     "tournamentName": "Championship", 
     "tournamentShortName": "EC", 
     "rating": 7.660000000000001, 
     "ranking": 3, 
     "apps": 18, 
     "subOn": 0, 
     "minsPlayed": 1620, 
     "manOfTheMatch": 3, 
     "yellowCard": 8.0, 
     "redCard": 0.0, 
     "goal": 1, 
     "assistTotal": 1, 
     "shotsPerGame": 0.61111111111111116, 
     "aerialWonPerGame": 3.5, 
     "passSuccess": 79.72251867662753 
    }, 
    { 
     "name": "Tom Clarke", 
     "firstName": "Tom", 
     "lastName": "Clarke", 
     "playerId": 133974, 
     "height": 180, 
     "weight": 77, 
     "age": 28, 
     "isManOfTheMatch": false, 
     "isActive": true, 
     "isOpta": true, 
     "playedPositions": "-DC-", 
     "positionText": "Defender", 
     "playedPositionsShort": "D(C)", 
     "teamId": 181, 
     "teamName": "Preston", 
     "seasonId": 6365, 
     "seasonName": "2016/2017", 
     "tournamentId": 7, 
     "tournamentRegionId": 252, 
     "tournamentRegionCode": "gb-eng", 
     "regionCode": "gb-eng", 
     "tournamentName": "Championship", 
     "tournamentShortName": "EC", 
     "rating": 7.6126315789473677, 
     "ranking": 4, 
     "apps": 19, 
     "subOn": 0, 
     "minsPlayed": 1692, 
     "manOfTheMatch": 4, 
     "yellowCard": 0.0, 
     "redCard": 0.0, 
     "goal": 2, 
     "assistTotal": 0, 
     "shotsPerGame": 0.89473684210526316, 
     "aerialWonPerGame": 5.4736842105263159, 
     "passSuccess": 66.666666666666657 
    }, 
    { 
     "name": "Pontus Jansson", 
     "firstName": "Pontus", 
     "lastName": "Jansson", 
     "playerId": 121123, 
     "height": 194, 
     "weight": 89, 
     "age": 25, 
     "isManOfTheMatch": false, 
     "isActive": true, 
     "isOpta": true, 
     "playedPositions": "-DC-", 
     "positionText": "Defender", 
     "playedPositionsShort": "D(C)", 
     "teamId": 19, 
     "teamName": "Leeds", 
     "seasonId": 6365, 
     "seasonName": "2016/2017", 
     "tournamentId": 7, 
     "tournamentRegionId": 252, 
     "tournamentRegionCode": "gb-eng", 
     "regionCode": "se", 
     "tournamentName": "Championship", 
     "tournamentShortName": "EC", 
     "rating": 7.5976923076923066, 
     "ranking": 5, 
     "apps": 13, 
     "subOn": 0, 
     "minsPlayed": 1126, 
     "manOfTheMatch": 1, 
     "yellowCard": 6.0, 
     "redCard": 0.0, 
     "goal": 1, 
     "assistTotal": 0, 
     "shotsPerGame": 0.53846153846153844, 
     "aerialWonPerGame": 3.5384615384615383, 
     "passSuccess": 86.336633663366342 
    }, 
    { 
     "name": "Angus MacDonald", 
     "firstName": "Angus", 
     "lastName": "MacDonald", 
     "playerId": 110825, 
     "height": 184, 
     "weight": 70, 
     "age": 24, 
     "isManOfTheMatch": false, 
     "isActive": true, 
     "isOpta": true, 
     "playedPositions": "-DC-", 
     "positionText": "Defender", 
     "playedPositionsShort": "D(C)", 
     "teamId": 142, 
     "teamName": "Barnsley", 
     "seasonId": 6365, 
     "seasonName": "2016/2017", 
     "tournamentId": 7, 
     "tournamentRegionId": 252, 
     "tournamentRegionCode": "gb-eng", 
     "regionCode": "gb-eng", 
     "tournamentName": "Championship", 
     "tournamentShortName": "EC", 
     "rating": 7.5066666666666677, 
     "ranking": 6, 
     "apps": 12, 
     "subOn": 0, 
     "minsPlayed": 1080, 
     "manOfTheMatch": 0, 
     "yellowCard": 3.0, 
     "redCard": 0.0, 
     "goal": 0, 
     "assistTotal": 0, 
     "shotsPerGame": 0.33333333333333331, 
     "aerialWonPerGame": 4.833333333333333, 
     "passSuccess": 72.147651006711413 
    }, 
    { 
     "name": "Marc Roberts", 
     "firstName": "Marc", 
     "lastName": "Roberts", 
     "playerId": 138949, 
     "height": 183, 
     "weight": 81, 
     "age": 26, 
     "isManOfTheMatch": false, 
     "isActive": true, 
     "isOpta": true, 
     "playedPositions": "-DC-", 
     "positionText": "Defender", 
     "playedPositionsShort": "D(C)", 
     "teamId": 142, 
     "teamName": "Barnsley", 
     "seasonId": 6365, 
     "seasonName": "2016/2017", 
     "tournamentId": 7, 
     "tournamentRegionId": 252, 
     "tournamentRegionCode": "gb-eng", 
     "regionCode": "gb-eng", 
     "tournamentName": "Championship", 
     "tournamentShortName": "EC", 
     "rating": 7.503125, 
     "ranking": 7, 
     "apps": 16, 
     "subOn": 0, 
     "minsPlayed": 1440, 
     "manOfTheMatch": 1, 
     "yellowCard": 3.0, 
     "redCard": 0.0, 
     "goal": 2, 
     "assistTotal": 2, 
     "shotsPerGame": 0.625, 
     "aerialWonPerGame": 7.0625, 
     "passSuccess": 61.595547309833023 
    }, 
    { 
     "name": "Bradley Johnson", 
     "firstName": "Bradley", 
     "lastName": "Johnson", 
     "playerId": 12490, 
     "height": 178, 
     "weight": 68, 
     "age": 29, 
     "isManOfTheMatch": false, 
     "isActive": true, 
     "isOpta": true, 
     "playedPositions": "-MC-ML-", 
     "positionText": "Midfielder", 
     "playedPositionsShort": "M(CL)", 
     "teamId": 20, 
     "teamName": "Derby", 
     "seasonId": 6365, 
     "seasonName": "2016/2017", 
     "tournamentId": 7, 
     "tournamentRegionId": 252, 
     "tournamentRegionCode": "gb-eng", 
     "regionCode": "gb-eng", 
     "tournamentName": "Championship", 
     "tournamentShortName": "EC", 
     "rating": 7.4954545454545443, 
     "ranking": 8, 
     "apps": 11, 
     "subOn": 0, 
     "minsPlayed": 952, 
     "manOfTheMatch": 1, 
     "yellowCard": 4.0, 
     "redCard": 0.0, 
     "goal": 2, 
     "assistTotal": 1, 
     "shotsPerGame": 1.3636363636363635, 
     "aerialWonPerGame": 4.0909090909090908, 
     "passSuccess": 71.908127208480565 
    }, 
    { 
     "name": "Christophe Berra", 
     "firstName": "Christophe", 
     "lastName": "Berra", 
     "playerId": 8287, 
     "height": 186, 
     "weight": 81, 
     "age": 31, 
     "isManOfTheMatch": false, 
     "isActive": true, 
     "isOpta": true, 
     "playedPositions": "-DC-", 
     "positionText": "Defender", 
     "playedPositionsShort": "D(C)", 
     "teamId": 165, 
     "teamName": "Ipswich", 
     "seasonId": 6365, 
     "seasonName": "2016/2017", 
     "tournamentId": 7, 
     "tournamentRegionId": 252, 
     "tournamentRegionCode": "gb-eng", 
     "regionCode": "gb-sct", 
     "tournamentName": "Championship", 
     "tournamentShortName": "EC", 
     "rating": 7.4789473684210526, 
     "ranking": 9, 
     "apps": 19, 
     "subOn": 0, 
     "minsPlayed": 1710, 
     "manOfTheMatch": 3, 
     "yellowCard": 4.0, 
     "redCard": 0.0, 
     "goal": 0, 
     "assistTotal": 1, 
     "shotsPerGame": 0.94736842105263153, 
     "aerialWonPerGame": 6.2105263157894735, 
     "passSuccess": 58.636363636363633 
    }, 
    { 
     "name": "Adam Webster", 
     "firstName": "Adam", 
     "lastName": "Webster", 
     "playerId": 109922, 
     "height": 191, 
     "weight": 0, 
     "age": 21, 
     "isManOfTheMatch": false, 
     "isActive": true, 
     "isOpta": true, 
     "playedPositions": "-DC-", 
     "positionText": "Defender", 
     "playedPositionsShort": "D(C)", 
     "teamId": 165, 
     "teamName": "Ipswich", 
     "seasonId": 6365, 
     "seasonName": "2016/2017", 
     "tournamentId": 7, 
     "tournamentRegionId": 252, 
     "tournamentRegionCode": "gb-eng", 
     "regionCode": "gb-eng", 
     "tournamentName": "Championship", 
     "tournamentShortName": "EC", 
     "rating": 7.4780000000000006, 
     "ranking": 10, 
     "apps": 15, 
     "subOn": 1, 
     "minsPlayed": 1227, 
     "manOfTheMatch": 2, 
     "yellowCard": 1.0, 
     "redCard": 0.0, 
     "goal": 0, 
     "assistTotal": 0, 
     "shotsPerGame": 0.2, 
     "aerialWonPerGame": 5.0666666666666664, 
     "passSuccess": 58.256029684601117 
    }], 
    "paging": { 
     "currentPage": 1, 
     "totalPages": 34, 
     "resultsPerPage": 10, 
     "totalResults": 338, 
     "firstRecordIndex": 1, 
     "lastRecordIndex": 10 
    }, 
    "statColumns": ["apps", 
    "subOn", 
    "minsPlayed", 
    "goal", 
    "assistTotal", 
    "yellowCard", 
    "redCard", 
    "shotsPerGame", 
    "passSuccess", 
    "aerialWonPerGame", 
    "manOfTheMatch"] 
} 

沒錯!那麼現在怎麼辦? 模擬這一請求和解析的內容,因爲它是JSON格式化已經,內建模塊json很容易做的工作,你甚至不必使用BeautifulSoup

也許你會問,爲什麼我什麼也沒得到,當我直接瀏覽此鏈接?這是因爲他們在服務器上設置了限制,以便只有具有有效標題的請求才能獲得提要。那麼我怎麼繞過這個呢? 使用正確的參數(主要是標題)「生動地」模擬,以便他們相信你。