2014-03-30 55 views
5

所有的名字我解析從http://hackage.haskell.org/package/xml-conduit-1.1.0.9/docs/Text-XML-Stream-Parse.html獲得從XML的管道

這裏修改了XML是什麼樣子:

<?xml version="1.0" encoding="utf-8"?> 
<population xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns="http://example.com"> 
    <success>true</success> 
    <row_count>2</row_count> 
    <summary> 
    <bananas>0</bananas> 
    </summary> 
    <people> 
     <person> 
      <firstname>Michael</firstname> 
      <age>25</age> 
     </person> 
     <person> 
      <firstname>Eliezer</firstname> 
      <age>2</age> 
     </person> 
    </people> 
</population> 

如何獲得的firstnameage每個人名單?

我的目標是使用HTTP的管道下載此XML,然後解析它,但是我正在尋找如何在沒有屬性分析的解決方案(使用tagNoAttrs?)

這裏是我的」已經嘗試過了,我已經將我的問題在哈斯克爾評論:

{-# LANGUAGE OverloadedStrings #-} 
import Control.Monad.Trans.Resource 
import Data.Conduit (($$)) 
import Data.Text (Text, unpack) 
import Text.XML.Stream.Parse 
import Control.Applicative ((<*)) 

data Person = Person Int Text 
     deriving Show 

-- Do I need to change the lambda function \age to something else to get both name and age? 
parsePerson = tagNoAttr "person" $ \age -> do 
     name <- content -- How do I get age from the content? "unpack" is for attributes 
     return $ Person age name 

parsePeople = tagNoAttr "people" $ many parsePerson 

-- This doesn't ignore the xmlns attributes 
parsePopulation = tagName "population" (optionalAttr "xmlns" <* ignoreAttrs) $ parsePeople 

main = do 
     people <- runResourceT $ 
      parseFile def "people2.xml" $$ parsePopulation 
     print people 
+1

編輯添加我迄今試過的和評論 – Lionel

回答

8

首先:在XML的解析導管組合程序沒有在很長一段時間被更新,並顯示他們的年齡。我建議大多數人使用DOM或遊標界面。這就是說,讓我們看看你的例子。您的代碼有兩個問題:

  • 它沒有正確處理XML名稱空間。所有元素名稱都位於http://example.com命名空間中,並且您的代碼需要反映這一點。
  • 解析組合器要求你佔用所有的元素。他們不會自動跳過一些元素給你。

因此,這裏是使用流API,它得到期望的結果的實現:

{-# LANGUAGE OverloadedStrings #-} 
import   Control.Monad.Trans.Resource (runResourceT) 
import   Data.Conduit     (Consumer, ($$)) 
import   Data.Text     (Text) 
import   Data.Text.Read    (decimal) 
import   Data.XML.Types    (Event) 
import   Text.XML.Stream.Parse 

data Person = Person Int Text 
     deriving Show 

-- Do I need to change the lambda function \age to something else to get both name and age? 
parsePerson :: MonadThrow m => Consumer Event m (Maybe Person) 
parsePerson = tagNoAttr "{http://example.com}person" $ do 
     name <- force "firstname tag missing" $ tagNoAttr "{http://example.com}firstname" content 
     ageText <- force "age tag missing" $ tagNoAttr "{http://example.com}age" content 
     case decimal ageText of 
      Right (age, "") -> return $ Person age name 
      _ -> force "invalid age value" $ return Nothing 

parsePeople :: MonadThrow m => Consumer Event m [Person] 
parsePeople = force "no people tag" $ do 
    _ <- tagNoAttr "{http://example.com}success" content 
    _ <- tagNoAttr "{http://example.com}row_count" content 
    _ <- tagNoAttr "{http://example.com}summary" $ 
     tagNoAttr "{http://example.com}bananas" content 
    tagNoAttr "{http://example.com}people" $ many parsePerson 

-- This doesn't ignore the xmlns attributes 
parsePopulation :: MonadThrow m => Consumer Event m [Person] 
parsePopulation = force "population tag missing" $ 
    tagName "{http://example.com}population" ignoreAttrs $ \() -> parsePeople 

main :: IO() 
main = do 
     people <- runResourceT $ 
      parseFile def "people2.xml" $$ parsePopulation 
     print people 

下面是一個使用遊標API的例子。請注意,它具有不同的錯誤處理特性,但對於格式良好的輸入應該產生相同的結果。

{-# LANGUAGE OverloadedStrings #-} 
import Text.XML 
import Text.XML.Cursor 
import Data.Text (Text) 
import Data.Text.Read (decimal) 
import Data.Monoid (mconcat) 

main :: IO() 
main = do 
    doc <- Text.XML.readFile def "people2.xml" 
    let cursor = fromDocument doc 
    print $ cursor $// element "{http://example.com}person" >=> parsePerson 

data Person = Person Int Text 
     deriving Show 

parsePerson :: Cursor -> [Person] 
parsePerson c = do 
    let name = c $/ element "{http://example.com}firstname" &/ content 
     ageText = c $/ element "{http://example.com}age" &/ content 
    case decimal $ mconcat ageText of 
     Right (age, "") -> [Person age $ mconcat name] 
     _ -> [] 
+0

謝謝你這樣做的兩種方法!遊標API看起來更簡單。如果我使用http-conduit進行POST(這是我如何獲取xml),是否需要繼續使用xml-conduit或者我可以使用遊標API?我在http-conduit – Lionel

+0

中使用httpLbs(懶惰字節字符串)嗯,你仍然會使用xml-conduit,因爲遊標API是它的一部分。你可以做的最有效的方法是將'sinkDoc'和'http'一起使用。雖然你可以走更簡單的路線,如果你願意,也可以使用'httpLbs'。 –

+0

遊標API是否仍然避免將整個XML結構一次保存在內存中? Nevermind,似乎是這樣:http://stackoverflow.com/questions/29454267/how-to-use-the-xml-conduit-cursor-interface-for-information-extraction-from-a-la?rq=1 – unhammer