python
  • web-scraping
  • beautifulsoup
  • python-requests
  • 2016-03-13 55 views 1 likes 
    1

    我想從使用BeautifulSoup的網站中提取公司名稱和地址等數據的摘錄。然而,我得到以下失敗:用BeautifulSoup刮擦:物體沒有屬性

    Calgary's Notary Public 
    Traceback (most recent call last): 
        File "test.py", line 16, in <module> 
        print item.find_all(class_='jsMapBubbleAddress').text 
    AttributeError: 'ResultSet' object has no attribute 'text' 
    

    HTML代碼片段在這裏。我想提取所有文本信息並轉換爲CSV文件。請任何人幫助我。

    <div class="listing__right article hasIcon"> 
        <h3 class="listing__name jsMapBubbleName" itemprop="name"><a data-analytics='{"lk_listing_id":"100971374","lk_non-ad-rollup":"0","lk_page_num":"1","lk_pos":"in_listing","lk_proximity":"14.5","lk_directory_heading":[{"085100":[{"00910600":"1"},{"00911000":"1"}]}],"lk_geo_tier":"in","lk_area":"left_1","lk_relevancy":"1","lk_name":"busname","lk_pos_num":"1","lk_se_id":"e292d1d2-f130-463d-8f0c-7dd66800dead_Tm90YXJ5_Q2FsZ2FyeSwgQUI_56","lk_ev":"link","lk_product":"l2"}' href="/bus/Alberta/Calgary/Calgary-s-Notary-Public/100971374.html?what=Notary&amp;where=Calgary%2C+AB&amp;useContext=true" title="See detailed information for Calgary's Notary Public">Calgary's Notary Public</a> </h3> 
        <div class="listing__address address mainLocal"> 
         <em class="itemCounter">1</em> 
         <span class="listing__address--full" itemprop="address" itemscope="" itemtype="http://schema.org/PostalAddress"> 
         <span class="jsMapBubbleAddress" itemprop="streetAddress">340-600 Crowfoot Cres NW</span>, <span class="jsMapBubbleAddress" itemprop="addressLocality">Calgary</span>, <span class="jsMapBubbleAddress" itemprop="addressRegion">AB</span> <span class="jsMapBubbleAddress" itemprop="postalCode">T3G 0B4</span></span> 
         <a class="listing__direction" data-analytics='{"lk_listing_id":"100971374","lk_non-ad-rollup":"0","lk_page_num":"1","lk_pos":"in_listing","lk_proximity":"14.5","lk_directory_heading":[{"085100":[{"00910600":"1"},{"00911000":"1"}]}],"lk_geo_tier":"in","lk_area":"left_1a","lk_relevancy":"1","lk_name":"directions-step1","lk_pos_num":"1","lk_se_id":"e292d1d2-f130-463d-8f0c-7dd66800dead_Tm90YXJ5_Q2FsZ2FyeSwgQUI_56","lk_ev":"link","lk_product":"l2"}' href="/merchant/directions/100971374?what=Notary&amp;where=Calgary%2C+AB&amp;useContext=true" rel="nofollow" title="Get direction to Calgary's Notary Public">Get directions »</a> 
        </div> 
        <div class="listing__details"> 
         <p class="listing__details__teaser" itemprop="description">We offer you a convenient, quick and affordable solution for your Notary Public or Commissioner for Oaths in Calgary needs.</p> 
        </div> 
        <div class="listing__ratings--root"> 
         <div class="listing__ratings ratingWarp" itemprop="aggregateRating" itemscope="" itemtype="http://schema.org/AggregateRating"> 
         <meta content="5" itemprop="ratingValue"/> 
         <meta content="1" itemprop="ratingCount"/> 
         <span class="ypStars" data-analytics-group="stars" data-clicksent="false" data-rating="rating5" title="Ratings: 5 out of 5 stars"> 
         <span class="star1" data-analytics-name="stars" data-label="Optional : Why did you hate it?" title="I hated it"></span> 
         <span class="star2" data-analytics-name="stars" data-label="Optional : Why didn't you like it?" title="I didn't like it"></span> 
         <span class="star3" data-analytics-name="stars" data-label="Optional : Why did you like it?" title="I liked it"></span> 
         <span class="star4" data-analytics-name="stars" data-label="Optional : Why did you really like it?" title="I really liked it"></span> 
         <span class="star5" data-analytics-name="stars" data-label="Optional : Why did you love it?" title="I loved it"></span> 
         </span><a class="listing__ratings__count" data-analytics='{"lk_listing_id":"100971374","lk_non-ad-rollup":"0","lk_page_num":"1","lk_pos":"in_listing","lk_proximity":"14.5","lk_directory_heading":[{"085100":[{"00910600":"1"},{"00911000":"1"}]}],"lk_geo_tier":"in","lk_area":"left_1","lk_relevancy":"1","lk_name":"read_yp_reviews","lk_pos_num":"1","lk_se_id":"e292d1d2-f130-463d-8f0c-7dd66800dead_Tm90YXJ5_Q2FsZ2FyeSwgQUI_56","lk_ev":"link","lk_product":"l2"}' href="/bus/Alberta/Calgary/Calgary-s-Notary-Public/100971374.html?what=Notary&amp;where=Calgary%2C+AB&amp;useContext=true#ypgReviewsHeader" rel="nofollow" title="1 of Review for Calgary's Notary Public">1<span class="hidden-phone"> YP review</span></a> 
         </div> 
        </div> 
        <div class="listing__details detailsWrap"> 
         <ul> 
         <li><a href="/search/si/1/Notaries/Calgary%2C+AB" title="Notaries">Notaries</a> 
          , 
         </li> 
         <li><a href="/search/si/1/Notaries+Public/Calgary%2C+AB" title="Notaries Public">Notaries Public</a></li> 
         </ul> 
        </div> 
    </div> 
    

    有許多div s的listing__right article hasIcon。我正在使用for循環來提取信息。

    我到目前爲止寫的python代碼是。

    import requests 
    from bs4 import BeautifulSoup 
    
    url = 'http://www.yellowpages.ca/search/si-rat/1/Notary/Calgary%2C+AB' 
    response = requests.get(url) 
    content = response.content 
    
    soup = BeautifulSoup(content) 
    g_data=soup.find_all('div', attrs={'class': 'listing__right article hasIcon'}) 
    
    for item in g_data: 
        print item.find('h3').text 
        #print item.contents[2].find_all('em', attrs={'class': 'itemCounter'})[1].text 
        print item.find_all(class_='jsMapBubbleAddress').text 
    
    +0

    'find_all '返回一個列表,Python中的列表沒有'text'屬性或屬性。嘗試遍歷代碼最後一行返回的列表。 – MrPyCharm

    +0

    我只想要第一個匹配元素 –

    +0

    print item.find_all(class _ ='jsMapBubbleAddress')[0] .text –

    回答

    1

    find_all返回它沒有「文本」屬性,使你得到一個錯誤的列表,不知道什麼輸出你要找的,但是這個代碼似乎確定工作:

    import requests 
    from bs4 import BeautifulSoup 
    
    url = 'http://www.yellowpages.ca/search/si-rat/1/Notary/Calgary%2C+AB' 
    response = requests.get(url) 
    content = response.content 
    
    soup = BeautifulSoup(content,"lxml") 
    g_data=soup.find_all('div', attrs={'class': 'listing__right article hasIcon'}) 
    
    for item in g_data: 
        print item.find('h3').text 
        #print item.contents[2].find_all('em', attrs={'class': 'itemCounter'})[1].text 
        items = item.find_all(class_='jsMapBubbleAddress') 
        for item in items: 
         print item.text 
    
    相關問題