2013-09-30 22 views
1

IM試圖捕捉的Joomla令牌白衣蟒蛇pycurl, IM寫這個函數代碼:正則表達式捕獲HTML隱藏輸入

import urllib, urllib2, sys, re 
import cStringIO 
import pycurl 

def CaptureToken(cURL): 
    buf = cStringIO.StringIO() 
    c = pycurl.Curl() 
    c.setopt(c.URL, cURL) 
    c.setopt(c.WRITEFUNCTION, buf.write) 
    c.setopt(c.CONNECTTIMEOUT, 30) 
    c.setopt(c.TIMEOUT, 30) 
    c.perform() 
    html = buf.getvalue() 
    buf.close() 
    results = re.match(r"(type=\"hidden\" name=\"([0-9a-f]{32})\")", html).group(1) 
    print results 

CaptureToken('http://www.proregionisbono.org.pl/administrator/index.php') 

在記事本++這個表達式的工作,在Python不工作:(,請人幫助我。

回答

3

re.match從字符串開始處匹配,你可能想re.search將字符串中的任何位置匹配。

Python docs

代碼的這個版本爲我工作:

import urllib, urllib2, sys, re 
import cStringIO 
import pycurl 

def CaptureToken(cURL): 
    buf = cStringIO.StringIO() 
    c = pycurl.Curl() 
    c.setopt(c.URL, cURL) 
    c.setopt(c.WRITEFUNCTION, buf.write) 
    c.setopt(c.CONNECTTIMEOUT, 30) 
    c.setopt(c.TIMEOUT, 30) 
    c.perform() 
    html = buf.getvalue() 
    buf.close() 
    results = re.search(r'(type="hidden" name="([0-9a-f]{32})")', html).group(2) 
    print results 

CaptureToken('http://www.proregionisbono.org.pl/administrator/index.php') 
+0

*和來自重封閉模式「s到封閉的年代,讓'\」'不再需要*漂亮的開關。 – PaulMcG