試圖使用python刮這個網站，但無法獲得所需的數據

-1

我想在本網站獲得公司名稱https://siftery.com/microsoft-outlook 基本上它列出了一些使用Microsoft Outlook的公司。我使用了BeautifulSoup，請求，urllib和urllib2，但我仍然沒有得到使用Microsoft Outlook的公司的名稱，甚至沒有在網站的第一頁。試圖使用python刮這個網站，但無法獲得所需的數據

我寫的代碼如下 -

r = requests.get('http://siftery.com/microsoft-outlook') 

print(str(r.content)) 
f=open('abc.txt','w') 
f.write(r.content) 
f.close()

和自己感興趣的部分輸出是這樣的 -

（{ 「名」：「市場營銷」，「處理」：「市場營銷」，「categories」：[{「name」：「Marketing Automation」，「handle」：「marketing-automation」，「external_id」：「tgJ_49k7v4J-wV」，「parent_handle」：null，「categories」：[{「name 「：」Marketing Automation Platforms「，」handle「：」marketing-automation-platforms「，」external_id「：」tgJLE9aHoLdneT「，」parent_handle「：」marketing-automation「}，

BeautifulSoup也給了我相同的輸出，所以其他庫。看起來像「external_id」是公司名稱的地方？我不確定。我還嘗試使用gedit手動查找公司的名稱，例如Acxiom，但找不到任何事件。

來源

2016-09-07 Gabriel

你的內容尋找是在頁面使用javascript加載後生成的。我使用[Selenium]（http://www.seleniumhq.org/）來解決類似的問題。 – SuperShoot

該網站使用JavaScript加載信息，這意味着當您執行請求時，由於DOM是異步加載的，所以DOM呈現時沒有信息，因此您應該使用硒。

注：之前你建立一個刮板，你應該看看，如果該網站有殘疾CORS API或終點，你的情況，你可以得到的信息做一個POST請求https://siftery.com/product-json/<product_name>

來源

2016-09-07 03:36:57 arcegk

你怎麼能確定siftery.com有這個json服務？ – SuperShoot

@SuperShoot您可以使用[Chrome devtools]（https://developer.chrome.com/devtools） – arcegk

的數據是可直接作爲JSON使用。您可以使用請求得到它像這樣：

import requests 

r = requests.post('https://siftery.com/product-json/microsoft-outlook') 
data = r.json()['content'] 
companies = data['companies'] 
for company in companies: 
    print(companies[company]['name'])

輸出

 
Public Technologies 
Consalta 
PagesJaunes.ca 
Chumbak 
Media Classified 
P.I. Works 
Saatchi & Saatchi Pro 
Tribeck Strategies 
Marketecture Solutions, LLC 
Trinity Ventures 
ARGOS 
CFN Services 
Last.Backend 
Saatchi & Saatchi USA 
Netcad 
Central Element 
NextGear Capital 
Masao 
Avalon 
Motiwe 
Bilge Adam 
Impakt Athletics 
SOZO Design 
ThroughTek 
Abovo42 
Acxiom 
ICEPAY 
Connexta 
Clearview 
Mortgage Coach

有，你可能需要調查其他類別的信息：

>>> data.keys() 
[u'product', u'vendor', u'users', u'group_members', u'companies', u'customers', u'other_categories', u'current_user', u'page_info', u'portfolio_products', u'primary_category', u'metadata']

來源

2016-09-07 04:19:31 mhawke

試圖使用python刮這個網站，但無法獲得所需的數據

回答

相關問題