解析從mochiweb_html獲得的結果

我想解析HTML文件中的一些內容（無xml）。解析從mochiweb_html獲得的結果

在我檢索結構中使用mochiweb_html來解析一下：

1> inets:start(). 
2> {ok, {Status, Headers, Body}} = httpc:request("http://www.google.com"). 
3> {String, Attributes, Other} = mochiweb_html:parse(Body).

，結果是一樣的東西：

{<<"html">>, 
[{<<"itemscope">>,<<"itemscope">>}, 
    {<<"itemtype">>,<<"http://schema.org/WebPage">>}], 
[{<<"head">>,[], 
    [{<<"meta">>, 
    [{<<"itemprop">>,<<"image">>}, 
     {<<"content">>,<<"/images/google_favicon_128.png">>}], 
    []}, 
    {<<"title">>,[],[<<"Google">>]}, 
....

什麼是從mochiweb_http得到的結構檢索的最佳方式網頁中具有特定標籤的所有元素（例如，<span id="footer">）？

來源

2013-04-22 user601836

你可以使用mochiweb_xpath：

> mochiweb_xpath:execute("//span[@id='footer']", 
    mochiweb_html:parse(
     "<html><body><span>not this one</span><span id='footer'>but this one</span></body></html>")). 
[{<<"span">>, 
    [{<<"id">>,<<"footer">>}], 
    [<<"but this one">>]}]

來源

2013-04-23 09:57:10 legoscia

這取決於您的性能要求。

mochiweb結果是三元組形式，可能很容易轉換爲適合於xmerl的輸入。大部分工作將把屬性名稱轉換爲原子。然後你可以使用xmerl_xpath做一些非常靈活的查詢。

否則，你可以編寫一些不太靈活（但可能更快）的代碼來走樹。

來源

2013-04-22 19:40:32 EdF

如此。只需走樹就可以得到你需要的東西。 – 2013-04-23 06:06:30

你能舉個例子嗎？我被卡住了:( – user601836 2013-04-23 08:16:26

我更喜歡@legoscia提供的解決方案，我不知道mochiweb_xpath。 – EdF 2013-04-23 16:17:31

解析從mochiweb_html獲得的結果

回答

相關問題