解析.js頁面python

我有一個網頁http://timetable.ait.ie/js/filter.js，我非常需要解析這個頁面。過去幾天我一直在使用BeautifulSoup來解析html頁面，我真的知道我在那裏做什麼，但這個.js文件正在殺死我。解析.js頁面python

目前我使用下面的代碼：

import urllib 
page = urllib.urlopen("http://timetable.ait.ie/js/filter.js") 
pageInfo = page.read()

和它返回與18283行代碼的整個文件的字符串。在我試圖獲得對底部的員工姓名的代碼中，有一個數組：

staffarray[373][0] = "BRADY, DAMIEN"; 
staffarray[373][1] = "SCI"; 
staffarray[373][2] = "BRADY001608";

我需要從值[0]和[1]，然後建立與這些值的數據庫，我稍後可以參考。

我試過正則表達式來找到staffarray，但我完全沮喪嘗試獲取此信息。有沒有人可以幫助我。

來源

2016-11-12 Matthew Swart

urllib而請求從服務只讀取數據。 BS允許您在HTML中查找標籤 - 即使用JavaScript代碼標記

你可以寫一個正規表達式與捕獲組：

import re 
with open('filter.js') as file: 
    pattern = r'staffarray\[(?P<first_index>\d+)\]\s*\[(?P<second_index>\d+)\] = "(?P<name>.+)"' 
    for line in file: 
     match = re.search(pattern, line) 
     if match: 
      first_index, second_index, name = match.groups() 
      # do something with data

來源

2016-11-12 01:44:16 Stonecold

感謝您回答了一陣子。 –

相關問題