使用的Symfony的DomCrawler組件

-1

刮網站數據，我需要刮有「CR」上這個網站，如數值：使用的Symfony的DomCrawler組件

http://webapps.nyc.gov:8084/cics/f704/f403001i?BBL=1-00259-0071

不幸的是，我無法找到使用DomCrawler過濾器這樣的解決方案方法

http://symfony.com/doc/current/components/dom_crawler.html

任何一個有經驗的Symfony用戶可以幫幫我嗎？或者給我任何意見

這是我使用XPath方法：

$crawler->filterXPath('//div/center/table/tbody/tr/td[contains(., 'CR')]')->text()

更新我設法抓住所有的CR的使用：

//td/font[contains(., 'CR')]

，但我需要的是數字

謝謝

來源

2015-09-14 Park Broom

因此不爲人們獲得開發者的作品爲他們的自由的地方。發佈您的代碼，你試過到目前爲止，什麼是我只是用curl和regex很容易做到這一點 – tftd

我對xpath並不熟悉，也沒有任何經驗，這就是我使用xpath方法 $ crawler-> filterXPath（'//div/center/table/tbody/tr/td [contains（。，'CR'）]'） - > text（）; –

爬蟲類似於SimpleXML和jQuery。如果你不熟悉它們，你很難搞清楚如何獲取內容。您不必明確使用xpath即可獲取內容。你可以這樣做與filter（類似於jQuery的，即filter('body > .my_class')

$url = '...'; 

$crawler = new Crawler(file_get_contents($url)); 

$crawler->filterXPath("//td/font[contains(., ' CR')]")->each(function(Crawler $node, $i){ 
    $string = filter_var($node->parents()->first()->text(), FILTER_SANITIZE_URL); 
    $string = str_replace('CR', ' CR', $string); 
    var_dump($string); 
});

來源

2015-09-14 17:11:24 tftd

使用的Symfony的DomCrawler組件

回答

相關問題