在Spark Scala中過濾掉名稱空間xml

我想通過Spark流來讀取正在讀入的xml中的名稱空間信息。下面是一個示例xml。這是我正在嘗試的代碼。它應該從xml中過濾出所有「ns：0」，「ns：1」...。由於spark流，xml將作爲rdd讀入。在Spark Scala中過濾掉名稱空間xml

val message_filter = message.filter(x => x.matches("([n][s][0-9]:)+"))) 

<?xml version="1.0"?> 
<Period> 
    <AllContacts> 
    <Entry> 
     <ns0:entity-Person> 
     <ns0:CellPhone>3095550101</ns0:CellPhone> 
     <ns0:FirstName>Brrzzz</ns0:FirstName> 
     <ns0:LastName>Grbbs</ns0:LastName> 
     </ns0:entity-Person> 
     <ns0:PrimaryPhone>mobile</ns0:PrimaryPhone> 
    </Entry> 
    </AllContacts> 
    <State>TX</State> 
</Period>

所需的格式：

<?xml version="1.0"?> 
<Period> 
    <AllContacts> 
    <Entry> 
     <entity-Person> 
     <CellPhone>3095550101</CellPhone> 
     <FirstName>Brrzzz</FirstName> 
     <LastName>Grbbs</LastName> 
     </entity-Person> 
     <PrimaryPhone>mobile</PrimaryPhone> 
    </Entry> 
    </AllContacts> 
    <State>TX</State> 
</Period>

來源

2016-06-09 Defcon

如果x.matches接受正則表達式，比你的正則表達式應該是這個樣子：/ns\d+:([\w-]+)/g這裏是regex101.com

的例子

來源

2016-06-09 17:03:48

在Spark Scala中過濾掉名稱空間xml

回答

相關問題