python
  • web
  • beautifulsoup
  • screen-scraping
  • 2017-09-16 88 views 2 likes 
    2

    我是Python的新手,我正在使用BeautifulSoup編寫Python中的一個小刮板,以便從網頁獲取地址。我重視的是 enter image description here如何使用BeautifulSoup獲取Python中的特定內容?

    </div> 
        </div> 
        <div data-integration-name="redux-container" data-payload='{"name":"LocationsMapList","props":{"locations":[{"id":17305,"company_id":106906,"description":"","city":"New York","country":"United States","address":"5 Crosby St 3rd Floor","state":"New York","region":"","latitude":40.719753,"longitude":-74.0001954,"hq":true,"created_at":"2015-01-19T01:32:16.317Z","updated_at":"2016-05-05T07:57:19.282Z","zip_code":"10013","country_code":"US","full_address":"5 Crosby St 3rd Floor, New York, 10013, New York, USA","dirty":false,"to_params":"new-york-us"}]},"storeName":null}' data-rwr-element="true"> 
    

    我使用BeautifulSoup了全部內容的圖片,但我不知道如何提取「full_address」的內容。我看到它在「div」中,但我不知道下一步該怎麼做。

    links = soup.find_all('div')

    非常感謝!

    +3

    (請添加您的代碼作爲文本,而不是圖片) – PRMoureu

    +1

    我加入吧。謝謝! – Laura

    +0

    'data-payload''屬性是json,所以使用'json.loads' –

    回答

    2

    您可以使用json分析數據:

    #!/usr/bin/env python 
    
    from bs4 import BeautifulSoup 
    import json 
    
    data = ''' 
    </div> 
        </div> 
        <div data-integration-name="redux-container" data-payload='{"name":"LocationsMapList","props":{"locations":[{"id":17305,"company_id":106906,"description":"","city":"New York","country":"United States","address":"5 Crosby St 3rd Floor","state":"New York","region":"","latitude":40.719753,"longitude":-74.0001954,"hq":true,"created_at":"2015-01-19T01:32:16.317Z","updated_at":"2016-05-05T07:57:19.282Z","zip_code":"10013","country_code":"US","full_address":"5 Crosby St 3rd Floor, New York, 10013, New York, USA","dirty":false,"to_params":"new-york-us"}]},"storeName":null}' data-rwr-element="true"> 
    ''' 
    
    soup = BeautifulSoup(data, 'html.parser') 
    for i in soup.find_all('div', attrs={'data-integration-name':'redux-container'}): 
        info = json.loads(i.get('data-payload')) 
        for i in info['props']['locations']: 
         print i['address'] 
    
    +0

    它說:KeyError:'locations' – Laura

    +1

    @Laura爲我的解決方案工作我假設你試圖解析的數據和你的文章中的數據完全一樣。您的數據是否與您發佈的錯誤具有相同的原因,似乎數據中不存在「位置」?同樣爲了查看鍵值,你可以'print i.keys()' – coder

    +0

    也許將它限制爲''div's''data-integration-name =「redux-container」''屬性。 – wwii

    相關問題