2013-10-11 77 views
0

我試圖使用ElementTree的結構類似這樣的一些.nxml文件在python解析.....消除外部參照標籤蟒蛇

<body> 
    <sec> 
     <title>INTRODUCTION</title> 
     <p>Experimentation with substances usually takes place during adolescence [<xref ref-type="bibr" rid="b1">1</xref>]. Adolescents are highly vulnerable to social influences [<xref ref-type="bibr" rid="b2">2</xref>], have lower tolerance levels and become dependent at lower doses than adults [<xref ref-type="bibr" rid="b3">3</xref>]. Adolescent-onset substance abuse is characterized by more rapid development of multiple drug dependencies and more severe psychopathology [<xref ref-type="bibr" rid="b4">4</xref>]. However, the majority of adolescents who experiment with substances do not become problem users. A better understanding is needed of the factors underlying initiation of substance use in adolescence versus heavy use and problem use. Specifically, if the liability to progress to heavier substance use is influenced by processes other than those that influence initiation, then primary prevention/intervention programmes can be only partly effective. It may be more successful, in terms of both cost and impact, to target those factors implicated in the progression to heavy/problem use. However, if the underlying liabilities to initiation and progression were strongly related, interventions could be tailored to both behaviours.</p> 

具體我試圖提取

之間的文本
<p> </p> tags. 

然而元素

[<xref> </xref>] 

在正文中中斷解析。

我一直在使用

for sec in body: 
    for p in sec: 
     for e in p: 
      e.remove (xref) 

嘗試,但該元素沒有得到承認。有任何想法嗎?

+0

你是什麼意思,「不承認」? – aIKid

回答

1

這將是更容易的工作:

for xref in body.findall('xref'): 
    body.remove(xref) 

更符合你的最新動態,請嘗試:

for sec in body.findall('sec'): 
    for p in sec.findall('p'): 
     for e in p.findall('xref'): 
      p.remove(e) 
0

其實我報廢了這一切,並使用BeautifulSoup刪除所有標籤。工作過一種享受。不相信我是這樣一個笨蛋。

+0

爲什麼不告訴我們你是如何做到的?目前這不是一個很好的答案。 – mzjn