2013-07-02 40 views
2
import regex,re 


sequence = 'aaaaaaaaaaaabbbbbbbbbbbbcccccccccccc' #being searched 
query = 'aaabbbbbbbbbbbbccc' #100% coverage 
query_1 = 'aaaabbbbbbbbcbbbcccc' #95% coverage 
query_2 = 'aaabbbbcbbbbbcbccc' #90% coverage 

threshold = .95 
error = len(query_1) - (len(query_1)*threshold) #for query_1 errors must be <= 1 

print regex.search(query_1 + '{e<={}}'.format(error),sequence).group(0) 

我試圖添加額外的參數到正則表達式搜索,所以它只適用於查詢的順序被查詢的一定比例的查詢。如何將可變誤差添加到正則表達式模糊搜索。蟒蛇

例如,如果我想這是至少95%的覆蓋率,將工作爲query_1,但它不會爲query_2

+2

的模糊匹配功能[正則表達式模塊](https://pypi.python.org/pypi/regex)可能是你正在尋找的。 –

回答

1

工作中使用的regex模塊:的

import regex 
sequence = 'aaaaaaaaaaaabbbbbbbbbbbbcccccccccccc' #being searched 
query = 'aaabbbbbbbbbbbbccc' #100% coverage 
query_1 = 'aaaabbbbbbbbcbbbcccc' #95% coverage 
query_2 = 'aaabbbbcbbbbbcbccc' #90% coverage 
threshold = 0.97 
queries = (query, query_1, query_2) 
for q in queries: 
    error = int(len(q) - (len(q)*threshold)) 
    m = regex.search(r'(%s){e<=%d}'%(q,error), sequence) 
    print 'match' if m else 'nomatch' 
+0

添加(%s)(%d)%(變量1,變量2)時稱爲什麼?我想看看這些文檔,因爲我在@perreal –

+0

之前已經看到它被稱爲舊式字符串格式:http://docs.python.org/2/library/stdtypes.html#string-formatting – perreal