2016-11-04 34 views
0

我想使用urllib去許多不同的URL來獲取一些HTML並解析它。我有一個循環,應該經過約5000次迭代的urllib:npm urllib ResponseTimeoutError - 如何增加超時?

urllib.request('a url here', options=[timeout=50000]).then(function (result) { 
       // data is Buffer instance 
       var $ = cheerio.load(result.data); 
       $('dt').each(function() { 
        var news_html = cheerio.load($(this).html()); 
        if (news_html('span.timestamp').html() != null) { 
         var date = news_html('span.timestamp').html(); 
         var description = news_html('.story_title').html(); 
         var link = news_html('a').attr('href'); 
         var post = {description: description, date: date, link: link}; 
         pool.query('INSERT INTO db SET ?', post, function (err, result) { 
          if (err) { 
           console.log(err); 
          } 
         }); 
        } 
       }); 
      }).catch(function (err) { 
       console.error(err); 
      }); 

經過約100次迭代,我得到這個錯誤:

{ ResponseTimeoutError: Response timeout for 5000ms, GET http://www.streetinsider.com/stock_lookup_news.php?q=CREG&type=major_news -1 (connected: true, keepalive socket: false) 
headers: {} 
    at Timeout._onTimeout (/Users/max/projects/stock-news-angular/node_modules/urllib/lib/urllib.js:715:15) 
    at tryOnTimeout (timers.js:232:11) 
    at Timer.listOnTimeout (timers.js:202:5) 
    name: 'ResponseTimeoutError', 
    requestId: 595, 
    data: undefined, 
    path: '/stock_lookup_news.php?q=CREG&type=major_news', 
    status: -1, 
    headers: {}, 
    res: 
    { status: -1, 
    statusCode: -1, 
    headers: {}, 
    size: 0, 
    aborted: false, 
    rt: 10030, 
    keepAliveSocket: false, 
    data: undefined, 
    requestUrls: [ 'http://www.streetinsider.com/stock_lookup_news.php?q=CREG&type=major_news' ], 
    timing: null, 
    remoteAddress: '162.242.133.50', 
    remotePort: 80 } } 

我該如何去增加超時,這樣我可以完成循環並將所有需要的數據插入到我的MySQL數據庫中?我認爲我沒有正確理解如何設置超時,因爲npm的urllib確實有設置該選項的選項。

回答

0

我想加timeout選項參數爲request()功能可以解決你的問題。

在API文檔:

timeout Number | Array - Request timeout in milliseconds for connecting phase and response receiving phase. Defaults to exports.TIMEOUT, both are 5s. You can use timeout: 5000 to tell urllib use same timeout on two phase or set them seperately such as timeout: [3000, 5000], which will set connecting timeout to 3s and response 5s.

+0

是的,但我不完全知道如何,我應該設置它。該方法似乎有三個參數:'http.request(url [,options] [,callback])''但我不太熟悉JS語法,不知道如何添加超時。我試過'urllib.request({url:[url],options:[timeout = 50000]})',這也沒用。 – needhelpwithR

+0

@needhelpwithR嘗試這樣的: 'http.request( 'http://example.com',{ 方法: 'GET', 數據:{ 'A': '你好', 'B': 'world' }, timeout:10000 // 10s });' – philipjkim

+0

當我添加它時,它不起作用。我收到一個ECONNRESET錯誤:'錯誤:在TCP.onread(net.js:564:26)處輸出exports._errnoException(util.js:1026:11) ECONNRESET', syscall:'read', name:'ResponseError',' – needhelpwithR