Python Regex中評論代碼

我想在大多數文件的開頭註釋掉代碼中匹配開源許可證類型。但是，對於期望的字符串（例如較低通用公共許可證）跨越兩行的情況，我遇到了困難。例如，查看許可證下面的代碼。Python Regex中評論代碼

* Copyright (c) Codice Foundation 
* <p/> 
* This is free software: you can redistribute it and/or modify it under the terms of the GNU Lesser 
* General Public License as published by the Free Software Foundation, either version 3 of the 
* License, or any later version. 
* <p/> 
* This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without 
* even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU 
* Lesser General Public License for more details. A copy of the GNU Lesser General Public License 
* is distributed along with this program and can be found at 
* <http://www.gnu.org/licenses/lgpl.html>. 
*/

使用正則表達式的回溯是不可能的，因爲在註釋代碼空間未知數量以及在不同的語言不同的註釋字符。我目前正則表達式的例子包括如下：

self._cr_license_re['GNU']       = re.compile('\sGNU\D') 
self._cr_license_re['MIT License']     = re.compile('MIT License|Licensed MIT|\sMIT\D') 
self._cr_license_re['OpenSceneGraph Public License'] = re.compile('OpenSceneGraph Public License', re.IGNORECASE) 
self._cr_license_re['Artistic License']    = re.compile('Artistic License', re.IGNORECASE) 
self._cr_license_re['LGPL']       = re.compile('\sLGPL\s|Lesser General Public License', re.IGNORECASE) 
self._cr_license_re['BSD']       = re.compile('\sBSD\D') 
self._cr_license_re['Unspecified OS']     = re.compile('free of charge', re.IGNORECASE) 
self._cr_license_re['GPL']       = re.compile('\sGPL\D|(?<!Lesser)\sGeneral Public License', re.IGNORECASE) 
self._cr_license_re['Apache License']     = re.compile('Apache License', re.IGNORECASE) 
self._cr_license_re['Creative Commons']    = re.compile('\sCC\D')

我歡迎就如何解決Python中使用正則表達式這個問題的任何建議。

來源

2016-11-17 lmum27

「如果只有一種方法可以將線條粘在一起成爲單個長字符串」？ – usr2564301

問題是什麼？用'\ s +'替換你的OpenSceneGraph公共許可證（和任何地方）中的所有文字空間，就是這樣。 –

你可以使用this regex，並用空格

\s*\*\s*\/?

這種替換應該把多行註釋在同一行，那麼你就可以找到它的許可證。

來源

2016-11-17 21:26:14

好的建議。但是，上面的正則表達式並沒有刪除換行符（'\ n'）字符。最終有效的是： 'text = fid.read（）。replace（'\ n'，''） fin_text = re.sub（'s * \ * \ s * \ /？'，''，text） ' – lmum27

Python Regex中評論代碼

回答

相關問題