過濾xml文件以刪除其中包含特定文本的行嗎？

例如，假設我有：過濾xml文件以刪除其中包含特定文本的行嗎？

<div class="info"><p><b>Orange</b>, <b>One</b>, ... 
<div class="info"><p><b>Blue</b>, <b>Two</b>, ... 
<div class="info"><p><b>Red</b>, <b>Three</b>, ... 
<div class="info"><p><b>Yellow</b>, <b>Four</b>, ...

而且我想刪除有話從一個列表，所以我只能在適合我的標準行使用XPath的所有行。例如，我可以使用列表['Orange', 'Red']來標記不需要的行，因此在上面的示例中，我只想使用第2行和第4行進行進一步處理。

我該怎麼做？

來源

2011-07-03 roni

問得好，+1。查看我的答案以獲得完整但簡短的單行XPath表達式解決方案。 –

使用：

//div 
    [not(p/b[contains('|Orange|Red|', 
        concat('|', ., '|') 
        ) 
      ] 
     ) 
    ]

這將選擇XML文檔中的任何div元素，使得它具有無p的孩子，他b孩子的字符串VALU是字符串的管道分隔的列表中的一個字符串用作過濾器。

該方法允許擴展性，只需將新的過濾器值添加到管道分隔列表中，而不更改XPath表達式中的其他任何內容。

注意：當XML文檔的結構是靜態已知時，請始終避免使用// XPath僞操作符，因爲它導致顯着的低效率（減速）。

來源

2011-07-03 20:52:39

import lxml.html as lh 

# http://lxml.de/xpathxslt.html 
# http://exslt.org/regexp/functions/match/index.html 
content='''\ 
<table> 
<div class="info"><p><b>Orange</b>, <b>One</b></p></div> 
<div class="info"><p><b>Blue</b>, <b>Two</b></p></div> 
<div class="info"><p><b>Red</b>, <b>Three</b></p></div> 
<div class="info"><p><b>Yellow</b>, <b>Four</b></p></div> 
</table> 
''' 
NS = 'http://exslt.org/regular-expressions' 
tree = lh.fromstring(content) 
exclude=['Orange','Red'] 
for elt in tree.xpath(
    "//div[not(re:test(p/b[1]/text(), '{0}'))]".format('|'.join(exclude)), 
    namespaces={'re': NS}): 
    print(lh.tostring(elt)) 
    print('-'*80)

產生

<div class="info"><p><b>Blue</b>, <b>Two</b></p></div> 

-------------------------------------------------------------------------------- 
<div class="info"><p><b>Yellow</b>, <b>Four</b></p></div> 

--------------------------------------------------------------------------------

來源

2011-07-03 21:04:09 unutbu

過濾xml文件以刪除其中包含特定文本的行嗎？

回答

相關問題