2016-12-02 147 views
0

我在Python中使用beautifulsoup,要得到所有鏈接:如何從DOM中的頁面獲取所有鏈接?

links = soup.select('.cover > .card-click-target') 
     print(links); 

但它給了我一個元素和字符串值的數組。

我的HTML代碼:

<div class="cover"> 
    <div class="cover-image-container"> 
    <div class="cover-outer-align"> 
     <div class="cover-inner-align"> 
     <img alt="Kate Mobile Lite" class="cover-image" data-cover-large="" data-cover-small="" src="" aria-hidden="true"> 
     </div> 
    </div> 
    </div> 
    <a class="card-click-target" href="/s/kate_new_6" aria-label=" Kate Mobile Lite  "> 
    <span class="movies preordered-overlay-container id-preordered-overlay-container" style="display:none"> 
     <span class="preordered-label">Предзаказ</span> 
    </span> 
    <span class="preview-overlay-container"> </span> 
    </a> 
</div> 

<div class="cover"> 
    <div class="cover-image-container"> 
    <div class="cover-outer-align"> 
     <div class="cover-inner-align"> 
     <img alt="Kate Mobile Lite" class="cover-image" data-cover-large="" data-cover-small="" src="" aria-hidden="true"> 
     </div> 
    </div> 
    </div> 
    <a class="card-click-target" href="/s/kate_new_6" aria-label=" Kate Mobile Lite  "> 
    <span class="movies preordered-overlay-container id-preordered-overlay-container" style="display:none"> 
     <span class="preordered-label">Предзаказ</span> 
    </span> 
    <span class="preview-overlay-container"> 
    </span> 
    </a> 
</div> 
+1

不看真的很難幫助頁面的實際來源,但如果你正在尋找鏈接(這是'a't ags),你應該使用'find_all('a')'。 – Dekel

+0

再次請看問題,我作了更改 – MisterPi

+0

我沒有看到任何更改 – Dekel

回答

1
link_tags = soup.find_all('a', class_="card-click-target") 
links = [i.get('href') for i in link_tags] 

出來:

['/s/kate_new_6', '/s/kate_new_6'] 

選擇版本:

link_tags = soup.select('.cover .card-click-target') 
links =[i.get('href') for i in link_tags] 
+0

謝謝,但是如何設置父目錄?'.cover> card-click-target' – MisterPi

1

我不會完全相信CSS選擇器BeautifulSoup,只是一個快速的搜索,你會發現this answer here談到更新BeautifulSoup固定他的問題。

我會強烈建議您write a function做的工作

links = soup.find_all(lambda tag: tag.parent.get('class', None) == ['cover'] \ 
         and tag.get('class', None) == ['card-click-target']) 

匿名lambda函數將搜索類的card-click-target所有標籤,並且確保這些標籤有一個父帶班的cover

0

檢查這個例子:

>>> s = """ <div class="cover"> 
     <div class="cover-image-container"> 
     <div class="cover-outer-align"> 
      <div class="cover-inner-align"> 
      <img alt="Kate Mobile Lite" class="cover-image" data-cover-large="" data-cover-small="" src="" aria-hidden="true"> 
      </div> 
     </div> 
     </div> 
     <a class="card-click-target" href="/s/kate_new_6" aria-label=" Kate Mobile Lite  "> 
     <span class="movies preordered-overlay-container id-preordered-overlay-container" style="display:none"> 
      <span class="preordered-label">Предзаказ</span> 
     </span> 
     <span class="preview-overlay-container"> </span> 
     </a> 
    </div> 

    <div class="cover"> 
     <div class="cover-image-container"> 
     <div class="cover-outer-align"> 
      <div class="cover-inner-align"> 
      <img alt="Kate Mobile Lite" class="cover-image" data-cover-large="" data-cover-small="" src="" aria-hidden="true"> 
      </div> 
     </div> 
     </div> 
     <a class="card-click-target" href="/s/kate_new_6" aria-label=" Kate Mobile Lite  "> 
     <span class="movies preordered-overlay-container id-preordered-overlay-container" style="display:none"> 
      <span class="preordered-label">Предзаказ</span> 
     </span> 
     <span class="preview-overlay-container"> 
     </span> 
     </a> 
    </div>""" 
>>> sp = BeautifulSoup(s) 
>>> sp.select(".cover > a.card-click-target") 
[<a aria-label=" Kate Mobile Lite  " class="card-click-target" href="/s/kate_new_6"> 
<span class="movies preordered-overlay-container id-preordered-overlay-container" style="display:none"> 
<span class="preordered-label">?????????</span> 
</span> 
<span class="preview-overlay-container"> </span> 
</a>, 
<a aria-label=" Kate Mobile Lite  " class="card-click-target" href="/s/kate_new_6"> 
<span class="movies preordered-overlay-container id-preordered-overlay-container" style="display:none"> 
<span class="preordered-label">?????????</span> 
</span> 
<span class="preview-overlay-container"> 
</span> 
</a>] 

>>> len(sp.select(".cover > a.card-click-target")) 
2 
+0

我仍然得到零,'len (sp.select(「。cover> a.card-click-target」))' – MisterPi

+0

在這個**完全**完整代碼中?或者您只使用** ** len(sp'部分? – Dekel

+0

是,我得到頁面的完整的HTML代碼,並在使用規則 – MisterPi

相關問題