2013-07-24 167 views
0

我正在嘗試執行以下a)或b)。我寧願a),如果我能弄明白的話。 請參考最後的html。從html中提取特定元素/值

a)提取以下項目的值 - 「」中的項目是靜態的,但關聯的值將會改變。我只想提取價值。

"locality" = Paris 
"region" = Paris 
"country-name" = France 
"latitude" = 48.85534 
"longitude" = 2.35048 

二)簡單的提取整個元素<div class="vcard">...<div>

我試圖重新使用別人的代碼,並使它做我想做的。但是我在編寫代碼時遇到了麻煩。我設法提取一些值。但它很混亂。我覺得代碼可以做得更要好得多:

的VBA

Sheet1.WebBrowser1.Navigate (Sheet1.Range("C1")) 

Do 
DoEvents 
Loop Until Sheet1.WebBrowser1.ReadyState = READYSTATE_COMPLETE 

the_html_code = Sheet1.WebBrowser1.Document.Body.InnerHTML 

    the_output_row = 2 
    start_of_item = InStr(the_html_code, "locality") 
    the_value = Mid(the_html_code, start_of_item + 39, Len(the_html_code)) 
    the_html_code = Mid(the_html_code, start_of_item + 8, Len(the_html_code)) 
    the_value = Mid(the_value, 1, InStr(the_value, Chr(62)) - 1) 
     Sheet1.Range("L" & the_output_row) = the_value 

的HTML

<script> 
     if (typeof (aadSponsoredLinksObj) != 'undefined' && aadSponsoredLinksObj.type == 'google' && aadSponsoredLinksObj.show_links == true) { 
      document.write('<scr' + 'ipt src="http://pagead2.googlesyndication.com/pagead/show_ads.js"></scr' + 'ipt>'); 
     } else if (typeof (aadSponsoredLinksObj) == 'undefined') { 
      jQuery('#ad-links').remove(); 
     } 
    </script> 
<div id="tracking-pixels"></div> 

</div> 
<!-- /#wrap --> 

    <div class="vcard"> 
     <span class="adr"> 
      <span class="locality"> 
       <span class="value-title" title="Paris" ></span> 
      </span> 
      <abbr class="region" title="Paris"> 
       <span class="value-title" title="75" ></span> 
      </abbr> 
      <abbr class="country-name" title="France"> 
       <span class="value-title" title="FR" ></span> 
      </abbr> 
     </span> 
     <span class="geo"> 
      <span class="latitude"> 
       <span class="value-title" title="48.85534" ></span> 
      </span> 
      <span class="longitude"> 
       <span class="value-title" title="2.35048"></span> 
      </span> 
     </span> 
    </div> 

    <script type="text/javascript"> 
     var _qoptions = { qacct: 'p-4b4gl_1fWISuU' }; 
     if (typeof (apgPageInfoObj) != 'undefined' && apgPageInfoObj.crumb_trail) { 
      _qoptions.labels = apgPageInfoObj.crumb_trail.join('.'); 
+2

爲什麼不使用適當的DOM解析器?這將比嘗試使用字符串函數解析HTML的效率高出約1000%。 –

回答

0

正如David Zemens建議,你可以用MSXML DOM解析器。您可以在VBA參考對話框中添加對Microsoft XML的引用(最好也可以使用最新的v6.0)。這個庫有一個在線參考here