2016-05-30 85 views
2

我有超過1000個字符串和一個固定的[sub]字符串數組。我想知道我的哪些字符串包含任何子字符串。 (同樣,子串是恆定的。)我也想確保詞是匹配的,而不是字符串。搜索某些字詞或詞組的字符串

什麼是最有效高效這樣做的方式?我可以比在所有子字符串上執行1000次indexOf()更好嗎?

let str1 = "During the winter holiday I'll go skiing." 
let str2 = "Do knock on the door or chime the bell"  
let fixedSearchStrings = ["ring the", "chime the bell", "knock on the door", "knock on the window"] 
str1.indexOf(fixedSearchStrings) // returns nil. "During" is not the word "ring". 
str2.indexOf(fixedSearchStrings) // returns 2. "knock on the door" substring found, no need to check further in the sentence. 
+0

每個字符串轉換爲正則表達式,使前/後只能是空格和標點符號。 – Sulthan

回答

1

請考慮這一點。這個解決方案的好處是已經準備好了fixedSearchStrings,你只能建立索引一次,然後有效地重用它。

class Index 
{ 
    var indexes: [String: Index] 
    var terminated: Bool = false 

    init() { 
     indexes = [String: Index]() 
    } 

    func searchFor(keywords: [String]) -> String? { 

     var ws = keywords 
     if ws.count > 0 { 

      let word = ws.removeFirst() 
      if let i = indexes[word] { 

       if i.terminated { 
        return word 
       } else { 

        if let rval = i.searchFor(ws) { 
         return "\(word) \(rval)" 
        } 
       } 
      } 
     } 
     return nil 
    } 

    func add(words: [String]) { 

     var ws = words 
     if ws.count > 0 { 
      let word = ws.removeFirst() 
      var index: Index! 
      if let i = indexes[word] { 
       index = i 
      } else { 
       let i = Index() 
       indexes[word] = i 
       index = i 
      } 
      index.add(ws) 
      index.terminated = ws.count == 0 || index.terminated 
     } 
    } 
} 

class SearchEngine { 

    var index: Index! 

    func buildIndex(keywords: [String]) { 

     index = Index() 
     for keyword in keywords { 
      let words = keyword.characters.split(" ").map(String.init) 
      index.add(words) 
     } 
    } 

    func firstEntryIn(string: String) -> String? { 

     var strArr = string.characters.split(" ").map(String.init) 
     var rval: String? 
     while strArr.count > 0 { 

      if let r = index.searchFor(strArr) { 
       rval = r 
       break 
      } 
      strArr.removeFirst() 
     } 
     return rval 
    } 
} 

let str1 = "During the winter holiday I'll go skiing." 
let str2 = "Do knock on the door or chime the bell" 
let fixedSearchStrings = ["ring the", "chime the bell", "knock on the door", "knock on the window"] 

let se = SearchEngine() 
se.buildIndex(fixedSearchStrings) 
se.firstEntryIn(str1) 
se.firstEntryIn(str2) 

的結果

nil 
"knock on the door" 
0
func foundSubString(str:String,array:[String]) -> Bool { 
     var count = 0 
     repeat { 
      print("count : \(count)") 
      if str.lowercaseString.rangeOfString(array[count].lowercaseString) != nil { 
       print("founded") 
       return true 
      } 
      count += 1 
     } while count < array.count 
     return false 
} 

使用

let str1 = "During the winter holiday I'll go skiing." 
let str2 = "Do knock on the door or chime the bell" 
let fixedSearchStrings = ["ring the", "chime the bell", "knock on the door", "knock on the window"] 
let exist: Bool = foundSubString(str2,array: fixedSearchStrings) 

結果

enter image description here

如果你想了解你的更多細節,例如,如果你找到一個窩第三,你需要知道這是什麼字,他的位置是:

func foundSubString2(str:String,array:[String]) -> (Bool,[(String,Int)]) { 
     var count: Int = 0 
     var matched = [(String,Int)]() 

     repeat { 
      if str.lowercaseString.rangeOfString(array[count].lowercaseString) != nil { 
       matched.append((array[count],count)) 
      } 
      count += 1 
     } while count < array.count 

     if matched.count>0 { 
      return (true,matched) 
     } 
     return (false,[("",0)]) 
} 

使用

let str1 = "During the winter holiday I'll go skiing." 
let str2 = "Do knock on the door or chime the bell" 
let fixedSearchStrings = ["ring the", "chime the bell", "knock on the door", "knock on the window", "knock on the door"] 
let (exist,matched) = foundSubString2(str2,array: fixedSearchStrings) 
if exist { print (matched) } 

結果

enter image description here

0

使用正則表達式。這將比indexOf或類似方法快大約1000倍。內部正則表達式將構建一個狀態機,它將能夠在一次傳遞中匹配所需的所有字符串。

+0

你能提供樣本代碼嗎? – Daniel

+0

正則表達式應該看起來像'^ | (響鈴)|(鈴聲鈴聲)| ... |(最後一串匹配)| $'。有關如何使用正則表達式,請參見http://stackoverflow.com/questions/28776945/swift-regex-matching – Sorin