BeautifulSoup爲了

考慮以下情況：BeautifulSoup爲了

tag1 = soup.find(**data_attrs) 
tag2 = soup.find(**delim_attrs)

有沒有辦法找出哪些標籤在頁面發生「第一」？

澄清：

對於我而言，順序是一樣的beautifulsoup的FindNext方法的。（我目前正在使用這個事實來「解決」我的問題，雖然它很亂。）
這裏的目的基本上是累積不以「分隔符標記」分隔的標記。也許有更好的方法來做到這一點？

來源

2014-12-28 Khodeir

BeautifulSoup標籤不追蹤他們在頁面中的順序，沒有。您必須再次循環所有標籤並在列表中找到您的兩個標籤。

使用標準sample BeautifulSoup tree：

>>> tag1 = soup.find(id='link1') 
>>> tag2 = soup.find(id='link2') 
>>> tag1, tag2 
(<a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>, <a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>) 
>>> all_tags = soup.find_all(True) 
>>> all_tags.index(tag1) 
6 
>>> all_tags.index(tag2) 
7

我會使用一個tag.find_all()與功能相匹配兩種標籤類型，而不是;這樣，你得到的標籤的列表，可以看到它們的相對順序：

tag_match = lambda el: (
    getattr(el, 'name', None) in ('tagname1', 'tagname2') and 
    el.attrs.get('attributename') == 'something' and 
    'classname' in el.attrs.get('class') 
) 
tags = soup.find(tag_match)

，或者你可以在同一個父使用.next_siblings迭代器遍歷所有元素，看看分隔符隨之而來的，等

來源

2014-12-28 11:23:26

BeautifulSoup爲了

回答

相關問題