2012-09-27 64 views
0

我想要做的是解析&從最終將保存到電子表格中的網頁中提取電影標題,但不包含所有HTML gunk。我的代碼:GoogleAppsScript:如何在解析HTML後修剪字符串?

function myFunction() { 
    var url = UrlFetchApp.fetch("http://boxofficemojo.com/movies/?id=clashofthetitans2.htm") 
    var doc = url.getContentText() 
    var patt1 = doc.match(/<font face\=\"Verdana\"\ssize\=\"6\"><b>.*?<\/b>/i); 

     //var cleaned = patt1.replace(/^<font face\=\"Verdana\" size\=\"6\"><b>/,""); 
     //Logger.log(cleaned); Didn't work, get "cannot find function in object" error. 
     //so tried making a function below: 

    String.trim = function() { 
    return this.replace(/^\W<font face\=\"Verdana\"\ssize\=\"6\"><b>/,""); } 
    Logger.log(patt1.trim()); 
} 

我很新的這一切(編程和GoogleScripting一般),我一直在引用上有w3school.com的JavaScript部分,但很多事情就是不與谷歌腳本的工作。我只是不確定這裏缺少什麼,是我的RegEx錯誤嗎?有沒有更好/更快的方式來提取這些數據而不是RegEx?任何幫助將是偉大的,謝謝你的閱讀!

+0

嘗試Xml服務https://developers.google.com/apps-script/service_xml – Srik

回答

2

嘗試從HTML中解析出信息不受您的控制時總是有點挑戰,有一種方法可以讓您更輕鬆地完成自己的工作。

我注意到,每部電影頁面的標題元素也包含電影標題,就像這樣:

<title>Wrath of the Titans (2012) - Box Office Mojo</title> 

你可能有更多的成功解析標題出來這一點,因爲它可能是更穩定。

var url = UrlFetchApp.fetch("http://boxofficemojo.com/movies/?id=clashofthetitans2.htm"); 
var doc = url.getContentText(); 
var match = content.match(/<title>(.+) \([0-9]{4}\) -/); 
Logger.log("Movie title is " + match[1]); 
+0

非常感謝!實際上,我試圖解析由vbulletin board系統生成的頁面 - 我只是碰巧在我的示例中使用了該網站,因爲它提供了一個直觀的.htm頁面,我認爲它會作爲初學者爲我工作。它花了很多的RegEx實驗,我仍然不知道我是如何降落在我想要的 - 但你肯定幫助,出於好奇不會是var match = doc.match(/ <small> <span></span> </small> </span> </p> </div> </div> </div> </div> </div> </article> <div> <script async src="https://pagead2.googlesyndication.com/pagead/js/adsbygoogle.js"></script> <ins class="adsbygoogle" style="display:block" data-ad-client="ca-pub-6208739752673518" data-ad-slot="1038284119" data-ad-format="auto" data-full-width-responsive="true"></ins> <script> (adsbygoogle = window.adsbygoogle || []).push({}); </script> </div> </div> <div class="clearfix"> </div> <div class="relative-box"> <div class="relative">相關問題</div> <ul class="relative_list"> <li> 1. <a href="http://hk.uwenku.com/question/p-qrruimvk-bdu.html" target="_blank" title="如何在JavaScript中修剪字符串?"> 如何在JavaScript中修剪字符串? </a> </li> <li> 2. <a href="http://hk.uwenku.com/question/p-uozqpkkw-bmz.html" target="_blank" title="如何在AppleScript中修剪字符串?"> 如何在AppleScript中修剪字符串? </a> </li> <li> 3. <a href="http://hk.uwenku.com/question/p-agjhjifa-et.html" target="_blank" title="修剪字符串"> 修剪字符串 </a> </li> <li> 4. <a href="http://hk.uwenku.com/question/p-ojiqjdjh-qo.html" target="_blank" title="修剪字符串"> 修剪字符串 </a> </li> <li> 5. <a href="http://hk.uwenku.com/question/p-vcsotkxl-wq.html" target="_blank" title="修剪字符串"> 修剪字符串 </a> </li> <li> 6. <a href="http://hk.uwenku.com/question/p-bakrueey-ma.html" target="_blank" title="修剪字符串"> 修剪字符串 </a> </li> <li> 7. <a href="http://hk.uwenku.com/question/p-enweeazv-cz.html" target="_blank" title="如何將字符串修剪爲最後四個字符?"> 如何將字符串修剪爲最後四個字符? </a> </li> <li> 8. <a href="http://hk.uwenku.com/question/p-grhaoijo-mx.html" target="_blank" title="修剪字符串在Javascript"> 修剪字符串在Javascript </a> </li> <li> 9. <a href="http://hk.uwenku.com/question/p-buqxwirm-vc.html" target="_blank" title="如何修剪在「()」中有子字符串的字符串?"> 如何修剪在「()」中有子字符串的字符串? </a> </li> <li> 10. <a href="http://hk.uwenku.com/question/p-plorzzla-mn.html" target="_blank" title="解析HTML字符串"> 解析HTML字符串 </a> </li> <div> <script async src="https://pagead2.googlesyndication.com/pagead/js/adsbygoogle.js"></script> <ins class="adsbygoogle" style="display:block; text-align:center;" data-ad-layout="in-article" data-ad-format="fluid" data-ad-client="ca-pub-6208739752673518" data-ad-slot="4606349252"></ins> <script> (adsbygoogle = window.adsbygoogle || []).push({}); </script> </div> <li> 11. <a href="http://hk.uwenku.com/question/p-alyegcxr-qk.html" target="_blank" title="Linq解析html字符串"> Linq解析html字符串 </a> </li> <li> 12. <a href="http://hk.uwenku.com/question/p-cxcaosly-eq.html" target="_blank" title="如何在golang中修剪字符串的前後空格?"> 如何在golang中修剪字符串的前後空格? </a> </li> <li> 13. <a href="http://hk.uwenku.com/question/p-oeqnkepf-tc.html" target="_blank" title="字符串修剪/子字符串C#"> 字符串修剪/子字符串C# </a> </li> <li> 14. <a href="http://hk.uwenku.com/question/p-urslncnt-du.html" target="_blank" title="試圖修剪'字符串'字符串?"> 試圖修剪'字符串'字符串? </a> </li> <li> 15. <a href="http://hk.uwenku.com/question/p-atzwbbpx-bht.html" target="_blank" title="如何修剪此JavaScript字符串?"> 如何修剪此JavaScript字符串? </a> </li> <li> 16. <a href="http://hk.uwenku.com/question/p-qbhcdzhz-bnw.html" target="_blank" title="如何從空間修剪字符串"> 如何從空間修剪字符串 </a> </li> <li> 17. <a href="http://hk.uwenku.com/question/p-brncmbtd-rg.html" target="_blank" title="如何修剪一個字符串?"> 如何修剪一個字符串? </a> </li> <li> 18. <a href="http://hk.uwenku.com/question/p-cghavxen-xr.html" target="_blank" title="如何修剪字符串格式char []?"> 如何修剪字符串格式char []? </a> </li> <li> 19. <a href="http://hk.uwenku.com/question/p-rrsypxfe-xt.html" target="_blank" title="如何修剪以下字符串?"> 如何修剪以下字符串? </a> </li> <li> 20. <a href="http://hk.uwenku.com/question/p-agultbdt-bmp.html" target="_blank" title="如何修剪BASIC中的字符串?"> 如何修剪BASIC中的字符串? </a> </li> <li> 21. <a href="http://hk.uwenku.com/question/p-zbpkuwxp-sc.html" target="_blank" title="如何修剪字符串的文件"> 如何修剪字符串的文件 </a> </li> <li> 22. <a href="http://hk.uwenku.com/question/p-qaflhoxt-xc.html" target="_blank" title="如何修剪ClientValidationFunction中的字符串"> 如何修剪ClientValidationFunction中的字符串 </a> </li> <li> 23. <a href="http://hk.uwenku.com/question/p-falrxbbf-zg.html" target="_blank" title="如何修剪StringBuilder的字符串?"> 如何修剪StringBuilder的字符串? </a> </li> <li> 24. <a href="http://hk.uwenku.com/question/p-kvqwskev-bem.html" target="_blank" title="解析JSON字符串IOS - HTML字符"> 解析JSON字符串IOS - HTML字符 </a> </li> <li> 25. <a href="http://hk.uwenku.com/question/p-rsbcfcxm-bkg.html" target="_blank" title="如何使用`正則表達式'修剪已解析的字符串?"> 如何使用`正則表達式'修剪已解析的字符串? </a> </li> <li> 26. <a href="http://hk.uwenku.com/question/p-xntachzn-zu.html" target="_blank" title="在某些字符後修剪python字符串"> 在某些字符後修剪python字符串 </a> </li> <li> 27. <a href="http://hk.uwenku.com/question/p-zzyyqaxd-yd.html" target="_blank" title="在字符之後和之前修剪字符串"> 在字符之後和之前修剪字符串 </a> </li> <li> 28. <a href="http://hk.uwenku.com/question/p-mtlxvupz-bbd.html" target="_blank" title="在字符'?'之後修剪字符串getElementsByClassName不起作用"> 在字符'?'之後修剪字符串getElementsByClassName不起作用 </a> </li> <li> 29. <a href="http://hk.uwenku.com/question/p-amiknepl-bkc.html" target="_blank" title="完全修剪字符串"> 完全修剪字符串 </a> </li> <li> 30. <a href="http://hk.uwenku.com/question/p-txssruxa-pw.html" target="_blank" title="修剪字符串與Dapper.NET"> 修剪字符串與Dapper.NET </a> </li> </ul> </div> <div> <script async src="https://pagead2.googlesyndication.com/pagead/js/adsbygoogle.js"></script> <ins class="adsbygoogle" style="display:block" data-ad-format="autorelaxed" data-ad-client="ca-pub-6208739752673518" data-ad-slot="1575177025"></ins> <script> (adsbygoogle = window.adsbygoogle || []).push({}); </script> </div> <div class="padding-top-10"></div> </div> </div> <script type="text/javascript" src="http://img.uwenku.com/uwenku/script/side.js?t=1644592048261"></script> <script type="text/javascript" src="http://img.uwenku.com/uwenku/plugin/highlight/highlight.pack.js"></script> <link href="http://img.uwenku.com/uwenku/plugin/highlight/styles/docco.css" media="screen" rel="stylesheet" type="text/css" /> <script type="text/javascript"> $('pre').each(function(i, e) { hljs.highlightBlock(e, "<span class='indent'> </span>", false) }); </script> <div class="col-lg-3 col-md-4 col-sm-5"> <div id="rightTop"> <div class="row"> <script async src="https://pagead2.googlesyndication.com/pagead/js/adsbygoogle.js"></script> <ins class="adsbygoogle" style="display:block" data-ad-client="ca-pub-6208739752673518" data-ad-slot="5415218910" data-ad-format="auto" data-full-width-responsive="true"></ins> <script> (adsbygoogle = window.adsbygoogle || []).push({}); </script> </div> <div class="row sidebar panel panel-default"> <div class="panel-heading font-bold"> 最新問題 </div> <div class="m-b-sm m-t-sm clearfix"> <ul class="side_article_list"> <li class="side_article_list_item"> 1. <a href="http://hk.uwenku.com/question/p-nsdomgwi-dt.html" target="_blank" title="爲什麼Laravel沒有保存表單數據?"> 爲什麼Laravel沒有保存表單數據? </a> </li> <li class="side_article_list_item"> 2. <a href="http://hk.uwenku.com/question/p-znqkpjvx-gg.html" target="_blank" title="在Docker容器中建立hello-world應用程序之間的對話"> 在Docker容器中建立hello-world應用程序之間的對話 </a> </li> <li class="side_article_list_item"> 3. <a href="http://hk.uwenku.com/question/p-glrisljs-cp.html" target="_blank" title="Webpack dev服務器拋出錯誤 - 拒絕執行腳本,因爲它的MIME類型('text/html')不可執行"> Webpack dev服務器拋出錯誤 - 拒絕執行腳本,因爲它的MIME類型('text/html')不可執行 </a> </li> <li class="side_article_list_item"> 4. <a href="http://hk.uwenku.com/question/p-xuntwxzq-kr.html" target="_blank" title="在科爾多瓦更改錯誤圖像"> 在科爾多瓦更改錯誤圖像 </a> </li> <li class="side_article_list_item"> 5. <a href="http://hk.uwenku.com/question/p-dsovceez-eb.html" target="_blank" title="需要正確的內聯jQuery語法才能更改fancybox的維度"> 需要正確的內聯jQuery語法才能更改fancybox的維度 </a> </li> <li class="side_article_list_item"> 6. <a href="http://hk.uwenku.com/question/p-smnbzqtj-z.html" target="_blank" title="合併在manyto許多實體挑起jointable"> 合併在manyto許多實體挑起jointable </a> </li> <li class="side_article_list_item"> 7. <a href="http://hk.uwenku.com/question/p-uxfwbghx-d.html" target="_blank" title="使用「if」語句按日期求和"> 使用「if」語句按日期求和 </a> </li> <li class="side_article_list_item"> 8. <a href="http://hk.uwenku.com/question/p-fpezmwls-cc.html" target="_blank" title="不知道如何在異步調用之外追加數組"> 不知道如何在異步調用之外追加數組 </a> </li> <li class="side_article_list_item"> 9. <a href="http://hk.uwenku.com/question/p-kjacpmvb-bmd.html" target="_blank" title="如何在odoo獲得的圖像簽名字段10"> 如何在odoo獲得的圖像簽名字段10 </a> </li> <li class="side_article_list_item"> 10. <a href="http://hk.uwenku.com/question/p-fxtgvvxl-bkr.html" target="_blank" title="福爾康着色器和資源:爲什麼統一,而不是常量資源"> 福爾康着色器和資源:爲什麼統一,而不是常量資源 </a> </li> </ul> </div> </div> </div> <p class="article-nav-bar"></p> <div class="row sidebar article-nav"> <div class="row box_white visible-sm visible-md visible-lg margin-zero"> <div class="top"> <h3 class="title"><i class="glyphicon glyphicon-th-list"></i> 相關問題</h3> </div> <div class="article-relative-content"> <ul class="side_article_list"> <li class="side_article_list_item"> 1. <a href="http://hk.uwenku.com/question/p-qrruimvk-bdu.html" target="_blank" title="如何在JavaScript中修剪字符串?"> 如何在JavaScript中修剪字符串? </a> </li> <li class="side_article_list_item"> 2. <a href="http://hk.uwenku.com/question/p-uozqpkkw-bmz.html" target="_blank" title="如何在AppleScript中修剪字符串?"> 如何在AppleScript中修剪字符串? </a> </li> <li class="side_article_list_item"> 3. <a href="http://hk.uwenku.com/question/p-agjhjifa-et.html" target="_blank" title="修剪字符串"> 修剪字符串 </a> </li> <li class="side_article_list_item"> 4. <a href="http://hk.uwenku.com/question/p-ojiqjdjh-qo.html" target="_blank" title="修剪字符串"> 修剪字符串 </a> </li> <li class="side_article_list_item"> 5. <a href="http://hk.uwenku.com/question/p-vcsotkxl-wq.html" target="_blank" title="修剪字符串"> 修剪字符串 </a> </li> <li class="side_article_list_item"> 6. <a href="http://hk.uwenku.com/question/p-bakrueey-ma.html" target="_blank" title="修剪字符串"> 修剪字符串 </a> </li> <li class="side_article_list_item"> 7. <a href="http://hk.uwenku.com/question/p-enweeazv-cz.html" target="_blank" title="如何將字符串修剪爲最後四個字符?"> 如何將字符串修剪爲最後四個字符? </a> </li> <li class="side_article_list_item"> 8. <a href="http://hk.uwenku.com/question/p-grhaoijo-mx.html" target="_blank" title="修剪字符串在Javascript"> 修剪字符串在Javascript </a> </li> <li class="side_article_list_item"> 9. <a href="http://hk.uwenku.com/question/p-buqxwirm-vc.html" target="_blank" title="如何修剪在「()」中有子字符串的字符串?"> 如何修剪在「()」中有子字符串的字符串? </a> </li> <li class="side_article_list_item"> 10. <a href="http://hk.uwenku.com/question/p-plorzzla-mn.html" target="_blank" title="解析HTML字符串"> 解析HTML字符串 </a> </li> </ul> </div> </div> </div> </div> </div> </div> </div><!-- wrap end--> <!-- footer --> <footer id="footer"> <div class="bg-simple lt"> <div class="container"> <div class="row padder-v m-t"> <div class="col-xs-8"> <ul class="list-inline"> <li><a href="http://hk.uwenku.com/contact">聯系我們</a></li> <li>© 2020 HK.UWENKU.COM</li> <li><a target="_blank" href="https://beian.miit.gov.cn/">沪ICP备13005482号-4</a></li> <li><script type="text/javascript" src="https://v1.cnzz.com/z_stat.php?id=1280101193&web_id=1280101193"></script></li> <li><a href="http://www.uwenku.com/" target="_blank" title="优文库">简体中文</a></li> <li><a href="http://hk.uwenku.com/" target="_blank" title="優文庫">繁體中文</a></li> <li><a href="http://ru.uwenku.com/" target="_blank" title="поле вопросов и ответов">Русский</a></li> <li><a href="http://de.uwenku.com/" target="_blank" title="Frage - und - antwort - Park">Deutsch</a></li> <li><a href="http://es.uwenku.com/" target="_blank" title="Preguntas y respuestas">Español</a></li> <li><a href="http://hi.uwenku.com/" target="_blank" title="कार्यक्रम प्रश्न और उत्तर पार्क">हिन्दी</a></li> <li><a href="http://it.uwenku.com/" target="_blank" title="IL Programma di chiedere Park">Italiano</a></li> <li><a href="http://ja.uwenku.com/" target="_blank" title="プログラム問答園区">日本語</a></li> <li><a href="http://ko.uwenku.com/" target="_blank" title="프로그램 문답 단지">한국어</a></li> <li><a href="http://pl.uwenku.com/" target="_blank" title="program o park">Polski</a></li> <li><a href="http://tr.uwenku.com/" target="_blank" title="Program soru ve cevap parkı">Türkçe</a></li> <li><a href="http://vi.uwenku.com/" target="_blank" title="Đáp ứng viên">Tiếng Việt</a></li> <li><a href="http://fr.uwenku.com/" target="_blank" title="Programme interrogation Park">Française</a></li> </ul> </div> </div> </div> </div> </div> </footer> <!-- / footer --> <script> var _hmt = _hmt || []; (function() { var hm = document.createElement("script"); hm.src = "https://hm.baidu.com/hm.js?f78a970f17b19a79fc477a3378096f29"; var s = document.getElementsByTagName("script")[0]; s.parentNode.insertBefore(hm, s); })(); </script> </body> </html>