有關正則表達式在Python

一些問題，我希望做一些文字轉換，例如從文本文件閱讀：有關正則表達式在Python

CONTENTS 
1. INTRODUCTION 
1.1 The Linear Programming Problem 2 
1.2 Examples of Linear Problems 7

，並寫入到另一個文本文件：

("CONTENTS" "#") 
("1. INTRODUCTION" "#") 
("1.1 The Linear Programming Problem 2" "#11") 
("1.2 Examples of Linear Problems 7" "#16")

目前我用這種轉換Python代碼是：

infile = open(infilename) 
outfile = open(outfilename, "w") 

pat = re.compile('^(.+?(\d+)) *$',re.M) 
def zaa(mat): 
    return '("%s" "#%s")' % (mat.group(1),str(int(mat.group(2))+9)) 

outfile.write('(bookmarks \n') 
for line in infile: 
    outfile.write(pat.sub(zaa,line)) 
outfile.write(')')

它將原來的文本轉換爲
```
CONTENTS 
1. INTRODUCTION 
("1.1 The Linear Programming Problem 2" "#11") 
("1.2 Examples of Linear Problems 7" "#16") 
```
最後兩行是正確的，但前兩行不。所以我想知道如何通過修改當前代碼或使用一些不同的代碼來容納前兩行：？
該代碼不是我自己寫的，但是我想了解re.sub()這裏的用法。當我從一個Python網站，

re.sub(regex, replacement, subject) performs a search-and-replace across subject, replacing all matches of regex in subject with replacement. The result is returned by the sub() function. The subject string you pass is not modified.

但在我的代碼中發現，它的用法是 `pat.sub（ZAA，線）」，這似乎我不帶引號的描述一致。所以我想知道如何瞭解我的代碼中的用法？

謝謝！

來源

2011-04-03 Tim

這是真正的代碼？你正在添加11，但2 + 11 = 13不是11. – Mikel 2011-04-03 02:55:09

@Mikel：謝謝你指出。我的錯字。剛剛糾正。 – Tim 2011-04-03 02:56:15

我也對're.sub（）'的東西感到困惑。原來有_two_子函數：'re.sub（pattern，repl，string [，count]）'，另一個用於編譯正則表達式對象：'RegexObject.sub（repl，string [，count = 0]） '。該函數使用後一種語法。 – ridgerunner 2011-04-03 03:17:28

此測試腳本生成所需的輸出：

import re 
infilename = "infile.txt" 
outfilename = "outfile.txt" 

infile = open(infilename) 
outfile = open(outfilename, "w") 

pat = re.compile('^(.+?(\d*)) *$',re.M) 
def zaa(mat): 
    if mat.group(2): 
     return '("%s" "#%s")' % (mat.group(1),str(int(mat.group(2))+9)) 
    else: 
     return '("%s" "#")' % (mat.group(1)) 

outfile.write('(bookmarks \n') 
for line in infile: 
    outfile.write(pat.sub(zaa,line)) 
outfile.write(')')

來源

2011-04-03 03:24:56 ridgerunner

謝謝！像魅力一樣工作！我想知道「。+」是指重複一次或多次一個字符，還是一個或多個不一定相同的字符序列？如果其中一個是它的意思，那麼正則表達式的意義是什麼？ – Tim 2011-04-03 03:33:23

'。+'表示一個或多個（可能不同）字符。 '（。）\ 1 +'表示至少兩個相同的字符。 – Mikel 2011-04-03 03:48:32

點表示匹配任何一個字符（除了換行符 - 除非設置了's'修飾符 - 在這種情況下，點匹配包括換行符的任何字符）。加號是添加到任何標記的量詞，表示前一個標記的_one或more_。明星是相似的，但它意味着_zero或以上的令牌。 – ridgerunner 2011-04-03 04:03:46

隨着你的正則表達式，你正在尋找一個以數字（也許是尾隨空白）結尾的行。您可以將該數字設爲可選：^(.+?(\d+)?) *$並確保zaa中的組2引用可以處理空字符串。

def zaa(mat): 
    return '("%s" "#%s")' % (mat.group(1), (str(int(mat.group(2))+9) if mat.group(2) else ""))

有了這個，你應該得到「＃」時mat.group(2)是空的，你現在是什麼得到，當它不是空的。

來源

2011-04-03 03:03:45 BudgieInWA

謝謝！我想知道如何確保我在zaa中的組2參考可以處理空字符串？ – Tim 2011-04-03 03:06:05

@Tim，我編輯了我的答案，使其更加清晰。我的'zaa'副本應該優雅地處理'map.group（2）'爲空。 – BudgieInWA 2011-04-03 03:18:18

謝謝！我在「？」得到了「SyntaxError：invalid syntax」。 – Tim 2011-04-03 03:21:44

But in my code, its usage is pat.sub(zaa,line) , which seems to me not consistent to the quoted description.

區別在於sub調用;你引用的文件是對re.sub功能，但這裏正在使用的是sub方法編譯regular expression object的。將re.sub()中的初始模式參數替換爲與sub方法綁定的正則表達式對象。因此，換句話說，

pat.sub(zaa, line)

相當於順便說

re.sub(pat, zaa, line)

可怕的變量名。

來源

2011-04-03 03:10:38 senderle

哦，澄清一下，我知道他們不是你的變量名！只是說... – senderle 2011-04-03 03:18:22

謝謝！是否有關於Python官方網站上的正則表達式對象的子方法的描述？ – Tim 2011-04-03 03:18:58

http://docs.python.org/library/re.html#re.RegexObject.sub – Mikel 2011-04-03 03:26:26

有關正則表達式在Python

回答

相關問題