2017-04-11 48 views
1

我正在使用beautifulsoup來抓取聊天消息,但是當提示打印時,輸出none並退出代碼0.我該做什麼不正確?beautifulsoup4找不到HTML

# import libraries, pip install beautifulsoup4. 
import urllib2 
from bs4 import BeautifulSoup 
import csv 
from datetime import datetime 

quote_page = 
'https://robertsspaceindustries.com/spectrum/community/SC/lobby/8' 

#finding 
page = urllib2.urlopen(quote_page) 
soup = BeautifulSoup(page, 'html.parser') 
name = soup.find('messages-items', attrs={'message-item status-default': 
'content'}) 
print name 

#logging 
with open('index.csv', 'a') as csv_file: 
    writer = csv.writer(csv_file) 
    writer.writerow([name, datetime.now()]) 

回答

1

如果oppening Chrome的網絡工具或Firebug的時候仔細觀察,你會發現,你的網站請求一個web服務來獲取所需的數據。

你需要模擬後有三個參數:

  • before這是收到的最後一個ID來獲得新的消息;
  • lobby_id這是您想要獲取的當前大廳;
  • size這是多少郵件獲取

它會返回一個JSON對象中,你只需要解析得到你想要的結果;

下面是一個例子:

import requests 
import json 

response = requests.post('https://robertsspaceindustries.com/api/spectrum/message/history', data = {'before': None, 'lobby_id':'8', 'size':'50'}) 
lobby_data = json.loads(response.content.decode("utf-8")) 

for comment in lobby_data["data"]["messages"]: 
    print ("%s: %s" % (comment["member"]["displayname"], comment["content_state"]["blocks"][0]["text"])) 

,輸出:

Antinov: Esp since spectrum doesn't even open a new tab to view large images.... 
Sir Quentin Reginald Watson: write a suggestion about it 
Antinov: As if CIG listens to those. 
Sir Quentin Reginald Watson: you will never know if you don't try 
....