要超過Google電子表格上的ImportXML限制

我現在正在「刮擦」問題上。特別是我想從谷歌電子表格中提取作者的網頁名稱。實際上，函數= IMPORTXML（A2，「// span [@ class ='author vcard meta-item']」）正在工作，但是在我提高了鏈接數量之後，它就開始加載了。要超過Google電子表格上的ImportXML限制

所以我研究和發現，這個問題是由於事實，有一個谷歌的限制。

是否有人知道超出限制或腳本，我可以「輕鬆複製」？ - 我真的沒有編碼的直覺。

來源

2016-08-18 rookie4

請結算[問]。 –

沒有這樣的腳本超出限制。由於代碼是在Google機器（服務器）上運行的，因此您無法作弊。一些限制與您的電子表格綁定，因此您可以嘗試使用多個電子表格（如果有幫助的話）。

來源

2016-08-18 11:30:01 michaelsinner

謝謝，這是個好主意，但問題在於，我不知道將數據劃分爲不同電子表格的確切限制。 importxml函數也需要很長時間才能提取？span類？我在尋找。 – rookie4

我創建了一個自定義導入函數，它克服了IMPORTXML的所有限制我有一個工作表在大約800個單元格中使用它，它工作得很好。

它利用Google Sheet的自定義腳本（工具>腳本編輯器...），並使用正則表達式而不是xpath搜索內容。

function importRegex(url, regexInput) { 
    var output = ''; 
    var fetchedUrl = UrlFetchApp.fetch(url, {muteHttpExceptions: true}); 
    if (fetchedUrl) { 
    var html = fetchedUrl.getContentText(); 
    if (html.length && regexInput.length) { 
     output = html.match(new RegExp(regexInput, 'i'))[1]; 
    } 
    } 
    // Grace period to not overload 
    Utilities.sleep(1000); 
    return unescapeHTML(output); 
}

然後，您可以像使用任何功能一樣使用此功能。

=importRegex("https://example.com", "<title>(.*)<\/title>")

當然，你也可以參考單元。

=importRegex(A2, "<title>(.*)<\/title>")

如果你不希望看到在輸出HTML實體，你可以使用這個功能。

var htmlEntities = { 
    nbsp: ' ', 
    cent: '¢', 
    pound: '£', 
    yen: '¥', 
    euro: '€', 
    copy: '©', 
    reg: '®', 
    lt: '<', 
    gt: '>', 
    mdash: '–', 
    ndash: '-', 
    quot: '"', 
    amp: '&', 
    apos: '\'' 
}; 

function unescapeHTML(str) { 
    return str.replace(/\&([^;]+);/g, function (entity, entityCode) { 
     var match; 

     if (entityCode in htmlEntities) { 
      return htmlEntities[entityCode]; 
     } else if (match = entityCode.match(/^#x([\da-fA-F]+)$/)) { 
      return String.fromCharCode(parseInt(match[1], 16)); 
     } else if (match = entityCode.match(/^#(\d+)$/)) { 
      return String.fromCharCode(~~match[1]); 
     } else { 
      return entity; 
     } 
    }); 
};

所有一起...

function importRegex(url, regexInput) { 
    var output = ''; 
    var fetchedUrl = UrlFetchApp.fetch(url, {muteHttpExceptions: true}); 
    if (fetchedUrl) { 
    var html = fetchedUrl.getContentText(); 
    if (html.length && regexInput.length) { 
     output = html.match(new RegExp(regexInput, 'i'))[1]; 
    } 
    } 
    // Grace period to not overload 
    Utilities.sleep(1000); 
    return unescapeHTML(output); 
} 

var htmlEntities = { 
    nbsp: ' ', 
    cent: '¢', 
    pound: '£', 
    yen: '¥', 
    euro: '€', 
    copy: '©', 
    reg: '®', 
    lt: '<', 
    gt: '>', 
    mdash: '–', 
    ndash: '-', 
    quot: '"', 
    amp: '&', 
    apos: '\'' 
}; 

function unescapeHTML(str) { 
    return str.replace(/\&([^;]+);/g, function (entity, entityCode) { 
     var match; 

     if (entityCode in htmlEntities) { 
      return htmlEntities[entityCode]; 
     } else if (match = entityCode.match(/^#x([\da-fA-F]+)$/)) { 
      return String.fromCharCode(parseInt(match[1], 16)); 
     } else if (match = entityCode.match(/^#(\d+)$/)) { 
      return String.fromCharCode(~~match[1]); 
     } else { 
      return entity; 
     } 
    }); 
};

來源

2018-01-28 06:17:59 Blanknewkid

要超過Google電子表格上的ImportXML限制

回答

相關問題