0
我試圖通過phantomjs來解析網頁遞歸。Phantomjs,遞歸頁面解析,內存使用問題
例如:
WebPage:
link1,
link2,
link3,
link4,
link5
nextPage
什麼,我這個頁面做:
var parsePage = function(links) {
// parse everyone link
for(var i = 0; i < posts.length; i++)
parsePost(links[i]);
};
parsePost - 我碰到一些頁面的信息,比如通過讓所有的電子郵件和電話,正則表達式,這需要很多時間
但phantomjs(js)是異步的,並沒有等待,它會解析每個人的鏈接,然後去到下一頁。 它的工作原理有點另一:
- parsing page1
- parsing link1
- parsing link2
....
- parsing link5
- parsing page2
- parsing link1
....
- parsing link5
-> and just now are comes results to console from parsed page1 -> link1
.....
- parsing page3
所以需要3分鐘我的6GB PC內存:DDD
我怎樣才能解決這個問題?
我正在試圖做的事:
1. mb limit program memory use? (it'll wait while some processes finished and then it continue to parse another pages ?)
2. i was trying to do like :
> page.open(link, function(... here is pageparser (wich parsing everyone link))
and then page.close()
but pageparser takes a lot of time, so when i use page.close -> it stop pageparser process.
你解決了嗎? – quento