2013-01-24 33 views
0

我有一個帶有多個javascript標籤的html頁面。我想從特定的標籤中提取數據的問題:從特定的<script type =「text/javascript」>獲取特定數據>

<head> 
... 
</head> 
<body> 
... 
<script type="text/javascript"> 

    $j(document).ready(function() { 

     if (!($j.cookie("ios"))) { 
      new $c.free.widgets.FreeAdvDialog().open(); 
      $j.cookie("ios", "seen", { path: '/', expires: 10000}); 
     }; 

     ajax_keys = ["d24349f205e3deb7f1015f42d3a14da7205b62e4", "0ae78c4797d47745ebd44e2754367da10c6f56a4", "567b2bfb6fd1aee784115da54e5e116a280ee225", "fc5cd251be46ff101c471553d52c07bf08c9aa65"]; 
     var is_dm = false; 

     /* async chart loader */ 
     var chart = new $c.free.widgets.Chart({ 
      target: $j('#graph'), 
      width: 990, 
      height: 275, 
      site: "911.com", 
      source_panel: 'us' 
     }); 

     var chart_view = new $c.free.widgets.ChartView({ 
      chart: chart, 
      csv_button: 'csv-export', 
      save_button: 'graph-image', 
      embed_button: 'embed-graph', 
      key: ajax_keys[1] 
     }); 
     chart_view.render(); 

     /* zoom info initialization */ 
     var zoom_info = new $c.free.widgets.ZoomInfo({ 
      site: "911.com", 
      el: '#zoominfo', 
      key: ajax_keys[3] 
     }); 
     zoom_info.load(); 


     /* compete numbers initialization */ 
     var compete_numbers = new $c.free.widgets.CompeteNumbers({ 
      site: "911.com", 
      key: ajax_keys[0], 
      el: '#compete_numbers' 
     }); 
     compete_numbers.load(); 

     /* DM Marketing widget init */ 
     new $c.free.widgets.DMSignupMessage({ 
      is_dm: is_dm, 
      compete_numbers: compete_numbers 
     }); 

     /* personalization initialization */ 


      var logged_in_as = null; 


     var d = { 
      site_name: "911.com", 
      logged_in_as: logged_in_as, 
      current_source_panel: {"display_abbreviation": "us", "panel_name": "us", "image_url": "http://media.compete.com/site_media/images/icons/flag_us.gif", "id": 1, "display_name": "United States"} 
     }; 

     var auth_model = new $c.free.widgets.FreeLoginModel(d); 
     var links_opts = { model: auth_model }; 
     var links_view = new $c.free.widgets.FreeAccountLinksView(links_opts); 

     var sites_view = new $c.free.widgets.FollowSiteButtonView(links_opts); 
     var manage_view = new $c.free.widgets.ManageSitesListButtonView(links_opts); 

     var sites = new $c.free.widgets.SimilarSitesCollection([], { 
      site: "911.com", 
      source_panel: 'us', 
      key: ajax_keys[2], 
      auth: auth_model 
     }); 
     var graph = new $c.free.widgets.BarGraph({ 
      el: $j('#similar-sites'), 
      collection: sites 
     }); 

     // tell KISSMetrics where we are 
     // also identify user so KM console can refer to them by email 
     if(logged_in_as != null) { 
      _kmq.push(['identify', logged_in_as]); 
     } 
     _kmq.push(['record', 'Viewed Free Site Analytics Report (M)']); 
    }); 

...

我怎樣才能從頁面的特定標籤ajax_keys(即「d24349f205e3deb7f1015f42d3a14da7205b62e4」)?

p.s.我試圖在Python腳本中使用正則表達式,但我無法從標記中檢索必要的元素。

感謝您的幫助。

回答

2

如果您使用像BeautifulSoup這樣的庫,您可以獲取特定的腳本標記,然後在標記的內容中使用正則表達式而不是整個文檔。

這就是說,它看起來像一個正則表達式將工作假設只存在一個ajax_keys

import re 

ajaxre = re.compile(r"^\s+ajax_keys = ([^;]+)", re.MULTILINE) 
ajax_string = ajaxre.match(source).group(1) 

# to get it as a python list 
import json 
ajax_keys = json.loads(ajax_string) 

編輯:感謝@Karl Knechtel爲json.loads

+1

「小心,一般做這個「 - 沒有理由;你想要'ast.literal_eval'。或者甚至可能是'json.loads'。 –

+0

eval是邪惡的。在json.loads上打電話,更新回答 – agoebel

+0

謝謝!另一個問題,我有多個<腳本類型=「文本/ JavaScript」>標籤沒有標籤ID。我試過了。是這樣的:從BS4進口BeautifulSoup 進口重新 進口的urllib2 數據= urllib2.urlopen( 'URL')閱讀() 湯= BeautifulSoup(數據) to_extract = soup.findAll( '腳本') 的項目。 inextract: item.extract() - 完全打印出頁面的所有

相關問題