只有在使用特定文本的標記後才能查找某個類的所有標記

我在HTML中有一個很長的長表，所以這些標記不會互相嵌套。它看起來像這樣：只有在使用特定文本的標記後才能查找某個類的所有標記

<tr> 
    <td>A</td> 
</tr> 
<tr> 
    <td class="x">...</td> 
    <td class="x">...</td> 
    <td class="x">...</td> 
    <td class="x">...</td> 
</tr> 
<tr> 
    <td class ="y">...</td> 
    <td class ="y">...</td> 
    <td class ="y">...</td> 
    <td class ="y">...</td> 
</tr> 
<tr> 
    <td>B</td> 
</tr> 
<tr> 
    <td class="x">...</td> 
    <td class="x">...</td> 
    <td class="x">...</td> 
    <td class="x">...</td> 
</tr> 
<tr> 
    <td class ="y">I want this</td> 
    <td class ="y">and this</td> 
    <td class ="y">and this</td> 
    <td class ="y">and this</td> 
</tr>

所以首先我要搜索樹以查找「B」。然後，我想在B之後但是在下一行表格以「C」開始之前抓取每個td標籤的文本。

我已經試過這樣：

results = soup.find_all('td') 
for result in results: 
    if result.string == "B": 
     print(result.string)

這讓我我想要的串B。但現在我試圖在這之後找到所有的東西，而且我沒有得到我想要的東西。

for results in soup.find_all('td'): 
    if results.string == 'B': 
     a = results.find_next('td',class_='y')

這給了我「B」，這就是我想要的東西之後的下一個TD，但我只能似乎得到的是第一個td標籤。我想抓住所有具有y類的標籤，在'B'之後但在'C'之前（C沒有在html中顯示，但遵循相同的模式），我想把它列入列表。

我的結果列表將是：

[['I want this'],['and this'],['and this'],['and this']]

來源

2015-10-02 strahanstoothgap

基本上，你需要找到包含B文本的元素。這是你的出發點。

然後，檢查每tr兄弟使用find_next_siblings()這個元素：

start = soup.find("td", text="B").parent 
for tr in start.find_next_siblings("tr"): 
    # exit if reached C 
    if tr.find("td", text="C"): 
     break 

    # get all tds with a desired class 
    tds = tr.find_all("td", class_="y") 
    for td in tds: 
     print(td.get_text())

測試你的數據。例如，它打印：

I want this 
and this 
and this 
and this

來源

2015-10-02 01:36:30 alecxe

謝謝您的答覆。這個對我有用。但是，我很幸運，因爲我需要的是每次兄弟姐妹的最後一次。因爲我不知道'C'會變成什麼樣子，寧願他變得活躍起來，我怎麼能讓這個變得更好，所以它不管用。因此，如果文本是'C'，而不是突破循環的迭代，我怎麼能檢查它不等於'B'。 – strahanstoothgap

只有在使用特定文本的標記後才能查找某個類的所有標記

回答

相關問題