2015-10-26 62 views
0

下面的方法分析來自nginx的日誌線捕捉URI可選參數:如何通過正則表達式

def test_parse_line2(self): 
    groups = ['ip', 'timestamp', 'offset', 'command', 'path', 'protocol', 'status', 'bytes', 'client'] 
    line = '1.2.3.4 - - [22/Oct/2015:12:01:49 -0500] "GET /mypath/?param1=value1&param2=value2 HTTP/1.1" 200 51 "-" "SomeRandomClient"' 
    pattern = r'(?P<ip>[^ ]+) - - \[(?P<timestamp>[^ ]+) (?P<offset>[-\+][0-9]{4})] "' +\ 
     r'(?P<command>[A-Z]+) /(?P<path>[^ ]+) (?P<protocol>[^"]+)" (?P<status>[0-9]+) (?P<bytes>[0-9]+) (?:[^ ]+)'+\ 
     r' "(?P<client>[^"]+)' 
    match = re.search(pattern, line) 
    if match: 
     for group_name in groups: 
      print(group_name, match.group(group_name)) 

是否有修改它,讓我單獨捕獲的必需路徑,mypath的方式,和可選參數,param1=value1&param2=value2

回答

0

需要通過兩個不同的匹配替換路徑的模式匹配:(?P<mypath>[^?]+)\?(?P<myargs>[^ ]+)

+0

這一點,如果沒有參數,我換成\後 – AlexC

+0

曾爲不起作用?用\ ?? – AlexC