Python的re.search應用re.sub串

我總是對一些結束與它的文件擴展名出發，如文件名：Python的re.search應用re.sub串

filename = 'photo_v_01_20415.jpg'

從它的文件名，我需要提取FILE_EXTENSION，最後一個數字它位於文件擴展名itslelf之前。由於分割，我應該有兩個字符串：

original_string = 'photo_v_01_20415.jpg' 

string_result_01 = `photo_v_01_` (first half of the file name) 

string_result_02 = `20415.jpg` (second half of the file name).

問題是傳入的文件名將不一致。最後一個數字可以通過下劃線「_」與空格「」隔開，並按句點「」分隔。或其他任何東西。可能的文件名示例：

photo_v_01_20415.jpg 
photo_v_01.20415.jpg 
photo_v_01 20415.jpg 
photo_v_01____20415.jpg

看來我需要使用re。表達式與re.search或re.sub。我會很感激任何建議！

來源

2013-09-25 alphanumeric

使用re.match，而不是re.search所有的字符串匹配模式。因此

import re 

def split_name(filename): 
    match = re.match(r'(.*?)(\d+\.[^.]+)', filename) 
    if match: 
     return match.groups() 
    else: 
     return None, None 

for name in [ 'foo123.jpg', 'bar;)234.png', 'baz^_^456.JPEG', 'notanumber.bmp' ]: 
    prefix, suffix = split_name(name) 
    print("prefix = %r, suffix = %r" % (prefix, suffix))

打印：

prefix = 'foo', suffix = '123.jpg' 
prefix = 'bar;)', suffix = '234.png' 
prefix = 'baz^_^', suffix = '456.JPEG' 
prefix = None, suffix = None

Works的任意後綴;如果文件名與模式不匹配，則匹配失敗，並返回None，None。

來源

2013-09-25 20:59:30

import re 

names = '''\ 
photo_v_01_20415.jpg 
photo_v_01.20415.jpg 
photo_v_01 20415.jpg 
photo_v_01____20415.jpg'''.splitlines() 

for name in names: 
    prefix, suffix = re.match(r'(.+?[_. ])(\d+\.[^.]+)$', name).groups() 
    print('{} --> {}\t{}'.format(name, prefix, suffix))

產生

photo_v_01_20415.jpg --> photo_v_01_ 20415.jpg 
photo_v_01.20415.jpg --> photo_v_01. 20415.jpg 
photo_v_01 20415.jpg --> photo_v_01  20415.jpg 
photo_v_01____20415.jpg --> photo_v_01____ 20415.jpg

的正則表達式模式r'(.+?[_. ])(\d+\.[^.]+)$'意味着

r'    define a raw string 
(    with first group 
    .+?   non-greedily match 1-or-more of any character 
    [_. ]   followed by a literal underscore, period or space 
)    end first group 
(    followed by second group 
    \d+   1-or-more digits in [0-9] 
    \.   literal period 
    [^.]+   1-or-more of anything but a period 
)    end second group 
$    match the end of the string 
'    end raw string

來源

2013-09-25 20:56:28 unutbu

謝謝！你們好棒！ – alphanumeric

我已經使用Antti Haapala解決方案的部分更正了我的答案;向Antti Haapala道歉，我無法忍受我的回答錯誤。我會留下我的回答主要是因爲它解釋了正則表達式的含義。 – unutbu

import re 

matcher = re.compile('(.*[._ ])(\d+.jpg)') 
result = matcher.match(filename)

根據需要向[._]添加其他選項。

來源

2013-09-25 21:03:49

這個解決方案效果很好：prefix，suffix = re.search（r'（。+？[_。]）（\ d + .jpg）$'，seq_name）.groups（）但是文件擴展名不會總是' JPG」。我怎麼能調整這個表達式，使其對於任何非JPG格式的文件都有效？ – alphanumeric

Python的re.search應用re.sub串

回答

相關問題