2011-10-24 58 views
1

讓說,我想從一個網頁用下面的標記中提取數據:使用YQL提取HTML內容?

<table> 
    <tr> 
    <td><a href="Link 1">Column 1 Text</a></td> 
    <td>Column 2 Text</td> 
    <td>Column 3 Text</td> 
    </tr> 
    <tr> 
    <td><a href="Link 2">Column 1 Text</a></td> 
    <td>Column 2 Text</td> 
    <td>Column 3 Text</td> 
    </tr> 
    ... 
</table> 

JSON格式

[ 
    { 
    link: 'Link 1', 
    text: 'Column 1 Text', 
    data: 'Column 3 Text' 
    }, 
    { 
    link: 'Link 2', 
    text: 'Column 1 Text', 
    data: 'Column 3 Text' 
    } 
] 

我們能用YQL做到嗎?如果是,那麼請給我一個示例查詢。

任何幫助將不勝感激!

回答

1

這裏有一個查詢,這是一個很好的起點,使用HTML表格與一些XPath查詢沿(見Extracting HTML Content With XPath,詳細瞭解此技術):

select * from html where url="http://cantoni.org/test/table.html" and xpath='//table/tr'

將會產生這樣的JSON結果:

{ 
"query": { 
    "count": 2, 
    "created": "2012-01-06T20:16:46Z", 
    "lang": "en-US", 
    "results": { 
    "tr": [ 
    { 
    "td": [ 
     { 
     "a": { 
     "href": "Link%201", 
     "content": "Column 1 Text" 
     } 
     }, 
     { 
     "p": "Column 2 Text" 
     }, 
     { 
     "p": "Column 3 Text" 
     } 
    ] 
    }, 
    { 
    "td": [ 
     { 
     "a": { 
     "href": "Link%202", 
     "content": "Column 1 Text" 
     } 
     }, 
     { 
     "p": "Column 2 Text" 
     }, 
     { 
     "p": "Column 3 Text" 
     } 
    ] 
    } 
    ] 
    } 
} 
}