0
我想通過Spark流來讀取正在讀入的xml中的名稱空間信息。下面是一個示例xml。這是我正在嘗試的代碼。它應該從xml中過濾出所有「ns:0」,「ns:1」...。由於spark流,xml將作爲rdd讀入。在Spark Scala中過濾掉名稱空間xml
val message_filter = message.filter(x => x.matches("([n][s][0-9]:)+")))
<?xml version="1.0"?>
<Period>
<AllContacts>
<Entry>
<ns0:entity-Person>
<ns0:CellPhone>3095550101</ns0:CellPhone>
<ns0:FirstName>Brrzzz</ns0:FirstName>
<ns0:LastName>Grbbs</ns0:LastName>
</ns0:entity-Person>
<ns0:PrimaryPhone>mobile</ns0:PrimaryPhone>
</Entry>
</AllContacts>
<State>TX</State>
</Period>
所需的格式:
<?xml version="1.0"?>
<Period>
<AllContacts>
<Entry>
<entity-Person>
<CellPhone>3095550101</CellPhone>
<FirstName>Brrzzz</FirstName>
<LastName>Grbbs</LastName>
</entity-Person>
<PrimaryPhone>mobile</PrimaryPhone>
</Entry>
</AllContacts>
<State>TX</State>
</Period>