2014-01-27 90 views
3

我有下面的XML:消除重複,更改標籤與scala.xml.transform.RuleTransformer

<tree> 
    <leaf id="1"/> 
    <leaf id="1"/> 
</tree> 

我希望做的是擺脫重複<leaf/> S的(整個XML文檔) ,並配有單<new-leaf/>像這樣替換它們:

<tree> 
    <new-leaf id="1"/> 
</tree> 

我已經寫了下面的RewriteRule,我相信應該已經完成​​了這個(原諒有狀態):

import scala.xml._ 
import scala.xml.transform._ 

class UniqueLeaves extends RewriteRule { 

    var leafIds = Set.empty[String] 

    override def transform(node: Node): Seq[Node] = node match { 
    case e: Elem if ((e.label == "leaf") && !leafIds.contains((e \\ "@id").text)) => { 
     leafIds += (e \\ "@id").text 
     <new-leaf id={(e \\ "@id")} /> 
    } 
    case e: Elem if (e.label == "leaf") => Seq.empty 
    case _ => node 
    } 

} 

不幸的是,使用RuleTransformer給了我下面的:

scala> val tree = <tree><leaf id="1"/><leaf id="1"/></tree> 
scala> println(new RuleTransformer(new UniqueLeaves).transform(tree)) 
<tree/> 

我假定這是因爲RuleTransformer calls transform on the RewriteRule multiple times,並且使用輸出非第一次調用<new-leaf>節點,它返回一個空Seq在我的比賽中。

希望有關使這項工作(以及無國籍)的任何提示。

回答

2

對於類似的問題的人,我已經找到了以下解決方案:

def removeDuplicates(tree: Node): Node = { 
    var ids = Set.empty[String] 
    def recurse(node: Node): Seq[Node] = node match { 
    case e: Elem if (e.label == "leaf") => { 
     val id = (e \\ "@id").text 
     ids.contains(id) match { 
     case true => Seq.empty 
     case _ => { 
      ids = ids + id 
      <new-leaf id={id}/> 
     } 
     } 
    } 
    case e: Elem => e.copy(child = e.nonEmptyChildren.map(recurse(_).headOption).flatten) 
    case _ => node 
    } 
    recurse(tree).head 
} 

這工作,因爲它手動處理遍歷節點,不使用RuleTransformer#transform,因此在相同的節點不重複多曾經(儘管它仍然是有狀態的,不幸的是)。