如何使用Python中的正則表達式使用特殊字符串刪除字符？

我試圖清理日誌，我想刪除一些特殊字符串如何使用Python中的正則表達式使用特殊字符串刪除字符？

例子：

%/h > %/h Current value over threshold value 
Pg/S > Pg/S Current value over threshold value 
Pg/S > Pg/S No. of pages paged in exceeds threshold 
MB < MB min. avg. value over threshold value

我曾嘗試使用一些模式，但它似乎沒有工作。

re.sub(r'\w\w\/\s>\s\w','',text)

有沒有什麼好主意讓我去除特殊模式？

我想刪除了.../...> .../...

我希望我的輸出只包含有用的話。

Current value over threshold value 
    No. of pages paged in exceeds threshold 
    min. avg. value over threshold value

謝謝你的任何想法！

來源

2016-11-10 zihan meng

是內容之前和之後的'>'總是一樣的？匹配'^（[^ \ s>] *）\ s +> \ s + \ 1'會是我的想法。 –

它總是以這種方式分開。換句話說，感興趣的字符串總是會在第三個空格之後出現嗎？ – idjaw

假設文件的結構是：

[特殊字符串] [<或>] [特殊字符串] [消息]

那麼這應該工作：

>>> rgx = re.compile(r'^[^<>]+[<>] +\S+ +', re.M) 
>>> 
>>> s = """ 
... %/h > %/h Current value over threshold value 
... Pg/S > Pg/S Current value over threshold value 
... Pg/S > Pg/S No. of pages paged in exceeds threshold 
... MB < MB min. avg. value over threshold value 
... """ 
>>> 
>>> print(rgx.sub('', s)) 
Current value over threshold value 
Current value over threshold value 
No. of pages paged in exceeds threshold 
min. avg. value over threshold value

來源

2016-11-10 02:47:58 ekhumoro

非常感謝！ –

我可以問你爲什麼在開始時使用^是否表示模式開始的初始位置？ –

@zihanmeng。是的 - 這意味着「匹配一行的開始」。這也是爲什麼需要're.M'標誌（即多行匹配）的原因。 – ekhumoro

根據你試圖匹配的模式，似乎你總是知道字符串的位置。你可以在沒有正則表達式的情況下做到這一點，只需要使用split和切片來獲得感興趣的部分。最後，使用join重新帶回字符串，以獲得最終結果。

下面的結果將做到以下幾點：

s.split() - 分裂的空間創建一個列表，其中每個字都會出現在列表中

[3:]的條目 - 從第四位置採取一切片名單（0索引）

' '.join() - 將轉換回爲一個字符串，將每個元件之間的空間從列表

演示：

s = "%/h > %/h Current value over threshold value" 
res = ' '.join(s.split()[3:])

輸出：

Current value over threshold value

來源

2016-11-10 01:46:07 idjaw

這是一個比較長的正則表達式，但它能夠完成任務。

[%\w][\/\w]\/?[\/\s\w]\s?\<?\>?\s\s[\w%]\/?[a-zA-Z%]\/?[\w]?\s\s?\s?

演示：https://regex101.com/r/ayh19b/4

或者，你可以這樣做：

^[\s\S]*?(?=\w\w(?:\w|\.))

演示：https://regex101.com/r/ayh19b/6

來源

2016-11-10 01:56:30 Ibrahim

如何使用Python中的正則表達式使用特殊字符串刪除字符？

回答

相關問題