2017-11-25 81 views
1

我已經使用Selenium在以下網站下載了https://www.eex-transparency.com/homepage/power/czech-republic/production/availability/non-usability/non-usability。我正在刮所有的表格數據。它運行良好,但運行該腳本需要相當長的時間。因此,我開始尋找替代方案,並在這裏使用API​​向StackOverflow發送了請求到服務器的幾個主題,但經過數小時的嘗試和搜索後,我放棄了,因爲我沒有得到幾件事:在Python中刪除AJAX加載的網站

  • 如何反向工程API發送正確的請求?
  • 我應該使用哪個url鏈接?

這是我想出了:

import json 
import requests 

url = "https://www.eex-transparency.com/ajax/en/navigation/ajaxGetNavi/12" 

data = { 
    "id": "16", 
    "title": "Czech Republic", 
    "url": "https:\\/\\/www.eex-transparency.com\\/homepage\\/power\\/czech-republic", 
    "class": "country", 
    "description": "", 
    "children": [ 
     { 
     "id": "649", 
     "title": "Production", 
     "url": False, 
     "class": "", 
     "description": "", 
     "children": [ 
      { 
      "id": "650", 
      "title": "Capacity", 
      "url": False, 
      "class": "", 
      "description": "", 
      "children": [ 
       { 
       "id": "651", 
       "title": "Installed Capacity", 
       "url": "https:\\/\\/www.eex-transparency.com\\/homepage\\/power\\/czech-republic\\/production\\/capacity\\/installed-capacity", 
       "class": "", 
       "description": "" 
       } 
      ] 
      } 
     ] 
     } 
     ]  
    } 


response = requests.get(url, data=data) 
file = response.json() 

在一般情況下,也許有人可以解釋,我應該以什麼措施刮除後網頁,我特別感興趣的是如何找到正確的來自Chrome( - > Inspect - > Network - > XHR)的信息以及如何從後面的信息生成data變量(即我輸入requests)?

回答