2017-10-14 156 views
-1

我想寫一個Python腳本來湊這個webpage.我試圖刮第二個表('class': 'char-pico-table')的數據資料和我使用這個腳本可以這樣做:刮網頁

def getPICO(url): 
    r = requests.get(url) 
    print (r.content) 

然而,這打印此:

b'<!DOCTYPE html>\n<html class="view">\n <head>\n <title>RobotReviewer: Automating evidence synthesis</title>\n <meta charset="utf-8">\n <meta name="viewport" content="width=device-width, initial-scale=1.0">\n <meta name="google" content="notranslate">\n\n <link rel="stylesheet" type="text/css" href="//maxcdn.bootstrapcdn.com/font-awesome/4.3.0/css/font-awesome.min.css">\n <link rel="stylesheet" type="text/css" href="/css/main.css">\n <link rel="stylesheet alternative prefetch" type=text/css href="/css/report.css">\n\n <!-- Preload examples -->\n <link rel="prefetch" href="/report_view/Tvg0-pHV2QBsYpJxE2KW-/html">\n <link rel="prefetch" href="/report_view/_fzGUEvWAeRsqYSmNQbBq/html">\n <link rel="prefetch" href="/report_view/HBkzX1I3Uz_kZEQYeqXJf/html">\n\n <!--/Preload examples -->\n\n\n <script src="/scripts/modernizr.js"></script>\n <script src="/scripts/spa/scripts/vendor/pdfjs/pdf.js"></script>\n <script src="/scripts/spa/scripts/vendor/compatibility.js"></script>\n <script data-main="/scripts/main" src="/scripts/require.js"></script>\n\n <script>\n  PDFJS.disableWebGL = false;\n  CSRF_TOKEN = "1508009356##6a03b1bf519972b27a0d871ae4823eb3a3366c0c";\n </script>\n </head>\n\n <body>\n <nav id="top-bar" class="top-bar" data-topbar role="navigation">\n  <div>\n  <ul class="title-area">\n   <li class="name">\n   <h1><a href="/"><img src="/img/logo.svg" width="190px"></a></h1>\n   </li>\n  </ul>\n\n  <section class="top-bar-section">\n   <ul class="right">\n   <li><a href="http://www.robotreviewer.net">About</a></li>\n   </ul>\n  </section>\n  </div>\n </nav>\n\n <div id="breadcrumbs"></div>\n\n <main id="main"></main>\n\n\n </body>\n</html>' 

這不是我在瀏覽器中查看頁面時看到的輸出 - 它不包含我希望刮取的數據。爲什麼不是這種情況?

當在Web瀏覽器中查看網頁它看起來像這樣:

Expected Output

+1

你期望得到什麼? – roganjosh

+0

是的,你想要什麼作爲輸出? –

+1

該網站使用javascript加載數據,您需要使用python庫「selenium」來提取數據 – Stack

回答

1

基於從@Shahin的評論,我寫了下面的代碼,它給我的數據在JSON格式從我很容易提取數據。

result = json.loads(requests.get('https://robot-reviewer.vortext.systems/report_view/'+id+'/json').content)