如何匹配Python原始字符串中的新行字符

我對Python原始字符串有點困惑。我知道，如果我們使用原始字符串，那麼它會將'\'視爲正常反斜槓（例如r'\ n'將是'\'和'n'）。但是，我想知道如果我想匹配原始字符串中的新行字符。我試過r'\ n'，但沒有奏效。有人對此有一些好的想法嗎？如何匹配Python原始字符串中的新行字符

來源

2013-02-04 wei

，我們在談論什麼樣的比賽這裏？你是在談論一個正則表達式匹配，或者只是一個'if ... in my_raw_string'？ – mgilson

很抱歉讓您困惑。我正在談論一個正則表達式。 – wei

在正則表達式，你需要指定你在多行模式是：

>>> import re 
>>> s = """cat 
... dog""" 
>>> 
>>> re.match(r'cat\ndog',s,re.M) 
<_sre.SRE_Match object at 0xcb7c8>

注意re平移\n（原始字符串）轉換成換行符。正如你在你的評論所指出的，你實際上並不需要re.M它來搭配，但它確實有更直觀的匹配$和^幫助：

>> re.match(r'^cat\ndog',s).group(0) 
'cat\ndog' 
>>> re.match(r'^cat$\ndog',s).group(0) #doesn't match 
Traceback (most recent call last): 
    File "<stdin>", line 1, in <module> 
AttributeError: 'NoneType' object has no attribute 'group' 
>>> re.match(r'^cat$\ndog',s,re.M).group(0) #matches. 
'cat\ndog'

來源

2013-02-04 15:22:51 mgilson

感謝您的回答@mgilson！我也想知道爲什麼我們需要指定多行模式。我嘗試過沒有它的匹配，就像這個「re.match（r'cat \ ndog'，s）」，它仍然有效。 – wei

@ user1783403 - 你說的沒錯。我應該更多地閱讀文檔。指定're.M'獲得'^'和'$'以更直觀地匹配。 – mgilson

可以通過任何方式讓'$'匹配「不那麼直觀」 - 即匹配*只在字符串的末尾？我不希望它之前'\ N' –

最簡單的答案就是不使用原始字符串。您可以使用\\來避免反斜槓。

如果你有反斜槓的龐大的數字在某些領域，那麼你可以串聯原始字符串和正常的字符串作爲需要：

r"some string \ with \ backslashes" "\n"

（Python的自動串接字符串常量，它們之間僅有空格。）

記住，如果你是在Windows上的路徑工作，最簡單的選擇是僅使用正斜槓 - 它仍然會正常工作。

來源

2013-02-04 15:06:24

@mgilson我只是檢查它與原始字符串和普通字符串一起工作，因爲它不是我所做的。像編輯一樣。實際上它會更好一些，因爲我相信這個連接是在解析時完成的，而不是在執行時。 –

是啊，我從來沒有真正前，現在要麼:) – mgilson

檢查爲什麼-1對此有何看法？ –

def clean_with_puncutation(text):  
    from string import punctuation 
    import re 
    punctuation_token={p:'<PUNC_'+p+'>' for p in punctuation} 
    punctuation_token['<br/>']="<TOKEN_BL>" 
    punctuation_token['\n']="<TOKEN_NL>" 
    punctuation_token['<EOF>']='<TOKEN_EOF>' 
    punctuation_token['<SOF>']='<TOKEN_SOF>' 
    #punctuation_token 



    regex = r"(<br/>)|(<EOF>)|(<SOF>)|[\n\!\@\#\$\%\^\&\*\(\)\[\]\ 
      {\}\;\:\,\.\/\?\|\`\_\\+\\\=\~\-\<\>]" 

###Always put new sequence token at front to avoid overlapping results 
#text = '<EOF>[email protected]#$%^&*()[]{};:,./<>?\|`~-= _+\<br/>\n <SOF>\ ' 
    text_="" 

    matches = re.finditer(regex, text) 

    index=0 

    for match in matches: 
    #print(match.group()) 
    #print(punctuation_token[match.group()]) 
    #print ("Match at index: %s, %s" % (match.start(), match.end())) 
     text_=text_+ text[index:match.start()] +" " 
       +punctuation_token[match.group()]+ " " 
     index=match.end() 
    return text_

來源

2017-12-15 16:09:22

如何匹配Python原始字符串中的新行字符

回答

相關問題