高效的數據結構來保存與通配符

字符串這個問題的Efficient data structure for word lookup with wildcards 高效的數據結構來保存與通配符

幾乎相反的假設，我們有urls

http://aaa.com/ 
http://bbb.com/ 
http://ccc.com/ 
....

數據庫中找到，如果一個url在名單我可以做一個上binary-search並在O(log n)時間獲得結果，n是列表的大小。

這種結構也提供了很多年，但現在我想有通配符數據庫中的條目，如：

http://*aaa.com/* 
http://*bbb.com/* 
http://*ccc.com/ 
....

而天真的搜索將導致與O(n)時間尋找一個完整的掃描。

哪個數據結構可以找到小於O(n)？

來源

2014-12-23 ppaulojr

你仍然可以做的二進制搜索，但保持知道網址的排序的列表用繩子從背後 – sashas

開始查詢網址：HTTP'：// test.ccc.com /''結果TRUE' – ppaulojr

是http ：//sasccc.com一個有效的查詢，即沒有點分隔符？ – sashas

如果事先知道所有的url，那麼你可以建立一個有限自動機，它將解決你在O（url長度）查詢中遇到的問題。

這有限自動機可以建成一個正則表達式：

http://(.*aaa\.com/.*|.*bbb\.com/.*|.*ccc\.com/)$

下面是一些Python代碼。在re.compile（）之後，每個查詢都非常快。

import re 

urls = re.compile("http://(.*aaa\.com/.*|.*bbb\.com/.*|.*ccc\.com/)$") 

print urls.match("http://testaaa.com/") is not None 
> True 
print urls.match("http://somethingbbb.com/dir") is not None 
> True 
print urls.match("http://ccc.com/") is not None 
> True 
print urls.match("http://testccc.com/") is not None 
> True 
print urls.match("http://testccc.com/ddd") is not None 
> False 
print urls.match("http://ddd.com/") is not None 
> False

來源

2014-12-23 18:31:47 Ricbit

我想你不能're.compile'一個非常大的字符串:) – ppaulojr

如果正則表達式實現不符合任務，您可以隨時構建自動機。這將使您更好地控制使用多少內存。 – Ricbit

感謝您的數據結構。 – ppaulojr

高效的數據結構來保存與通配符

回答

相關問題