保持一定的文本，並使用選擇

從下面的html元素如何選擇，以保持文本hi there!!和使用CSS選擇放棄其他文本Cat丟棄來自某些要素的休息嗎？此外，使用.text或.text.strip()我沒有得到結果，但是當我使用.text_content()我得到的文本。保持一定的文本，並使用選擇

from lxml.html import fromstring 

html=""" 
<div id="item_type" data-attribute="item_type" class="ms-crm-Inline" aria-describe="item_type_c"> 
    <div> 
     <label for="item_type_outer" id="Type_outer"> 
      <div class="NotVisible">Cat</div> 
     Hi there!! 
      <div class="GradientMask"></div> 
     </label> 
    </div> 
</div> 
""" 
root = fromstring(html) 
for item in root.cssselect("#Type_outer"): 
    print(item.text) # doesn't work 
    print(item.text.strip()) # doesn't work 
    print(item.text_content()) # working one

結果：

Cat 
Hi there!!

不過，我想獲得的結果僅僅是hi there!!併爲我的嘗試是：

root.cssselect("#Type_outer:not(.NotVisible)") #it doesn't work either

並再次提問：

爲什麼.text_content()是工作ing但是.text或.text.strip()是不是？
我怎樣才能只使用hi there!! CSS選擇器？

來源

2017-10-14 SIM

在LXML樹模型，你想要得到的文本是在div的tail帶班「NotVisible」：

>>> root = fromstring(html) 
>>> for item in root.cssselect("#Type_outer > div.NotVisible"): 
...  print(item.tail.strip()) 
... 
Hi there!!

所以要回答第一個問題，只有文本節點，是不是元素前面是父級的text屬性。具有上述兄弟元素的文本節點（如該問題中的節點）將位於該元素的tail屬性中。

另一種方式來獲取文本「您好！」通過查詢label的直接子節點的非空文本節點。可以使用XPath表達式來查詢這種詳細程度：

for item in root.cssselect("#Type_outer"): 
    print(item.xpath("text()[normalize-space()]")[0].strip())

來源

2017-10-14 08:36:58 har07

沒辦法！你非常有幫助。最後一兩件事：你能告訴我爲什麼'root.cssselect（「＃Type_outer：沒有（.NotVisible）」）'會失敗？原諒我的無知。再次感謝。 – SIM

該表達式選擇ID爲「Type_outer」的*元素沒有類「NotVisible」*，所以在這種情況下，它基本上返回與#Type_outer相同的元素，因爲具有該ID的標籤也沒有類「NotVisible」 – har07

保持一定的文本，並使用選擇

回答

相關問題