通過HTML頁面搜索某些文本？

我想玩弄Python來學習，所以我要帶一個小項目，但它的一部分，需要我來搜索該名稱列表：通過HTML頁面搜索某些文本？

https://bughunter.withgoogle.com/characterlist/1

（頭號要通過每一個搜索的名稱時遞增）

所以我將HTML刮它，我是新來的蟒蛇，並希望如果有人可以給我的例子如何使這項工作。

來源

2017-05-28 Ons Ali

試試[Beautiful Soup]（https://www.crummy.com/software/BeautifulSoup/bs4/doc/） –

看一看，謝謝。有人能提供一個例子嗎？ –

是的，看到我的回答如下 –

import json 
import requests 
from bs4 import BeautifulSoup 

URL = 'https://bughunter.withgoogle.com' 


def get_page_html(page_num): 
    r = requests.get('{}/characterlist/{}'.format(URL, page_num)) 
    r.raise_for_status() 
    return r.text 


def get_page_profiles(page_html): 
    page_profiles = {} 
    soup = BeautifulSoup(page_html) 
    for table_cell in soup.find_all('td'): 
     profile_name = table_cell.find_next('h2').text 
     profile_url = table_cell.find_next('a')['href'] 
     page_profiles[profile_name] = '{}{}'.format(URL, profile_url) 
    return page_profiles 


if __name__ == '__main__': 
    all_profiles = {} 
    for page_number in range(1, 81): 
     current_page_html = get_page_html(page_number) 
     current_page_profiles = get_page_profiles(current_page_html) 
     all_profiles.update(current_page_profiles) 
    with open('google_hall_of_fame_profiles.json', 'w') as f: 
     json.dump(all_profiles, f, indent=2)

你的問題不清楚你怎麼想刮，所以我剛纔保存的配置文件中的字典後的數據結構（鍵/值對爲{profile_name: profile_url}），然後將結果轉儲到json文件。

讓我知道如果有什麼不清楚！

來源

2017-05-29 01:53:50

嘿，代碼甚至沒有運行或給出任何迴應，任何想法？我使用python3與bs4 –

@OnsAli ['requests']（http://docs.python-requests.org/en/master/）是第三方庫（如'BeautifulSoup4'），必須與'pip安裝請求'。我只是再次測試它，它運行良好，輸出保存到一個名爲'google_hall_of_fame_profiles.json'的json文件。 –

不幸的是，我仍然沒有得到任何迴應，這裏是我的終端 http://i.imgur.com/pIRSnsg.png –

試試這個。您將需要首先安裝bs4（python 3）。這將讓所有的人的網站頁面上的名字：

from bs4 import BeautifulSoup as soup 
import urllib.request 
text=str(urllib.request.urlopen('https://bughunter.withgoogle.com/characterlist/1').read()) 
text=soup(text) 
print(text.findAll(class_='item-list')[0].get_text())

來源

2017-05-29 00:06:08

我做錯了什麼？ http://i.imgur.com/1jY9HYx.png –

通過HTML頁面搜索某些文本？

回答

相關問題