比方說在引用的一個人,我有:查找的文檔
一個數據庫與13000人項,包括
first name, name, birthday, street, zip code, city
一個長文本其中包括一個特定人的個人資料。因爲它是由OCR processesed它可能包含
spelling errors
在這裏你可以閱讀這些文字:
Harry Potter, born 25.03.1995, resident at Jahnstreet 43, London is a series of seven fantasy novels written by British author J. K. Rowling. The series chronicles the adventures of a young wizard, Harry Potter, the titular character, and his friends Ronald Weasley and Hermione Granger, all of whom are students at Hogwarts School of Witchcraft and Wizardry. The main story arc concerns Harry's quest to defeat the Dark wizard Lord Voldemort, who aims to become immortal, conquer the wizarding world, subjugate non-magical people, and destroy all those who stand in his way, especially Harry Potter. Since the release of the first novel, Harry Potter and the Philosopher's Stone, on 30 June 1997, the books have gained immense popularity, critical acclaim and commercial success worldwide.[2] The series has also had some share of criticism, including concern about the increasingly dark tone as the series progressed. As of May 2015, the books have sold more than 450 million copies worldwide, making the series the best-selling book series in history, and have been translated into 73 languages.[3][4] The last four books consecutively set records as the fastest-selling books in history, with the final installment selling roughly 11 million copies in the United States within the first 24 hours of its release. A series of many genres, including fantasy, coming of age and the British school story (with elements of mystery, thriller, adventureand romance), it has many cultural meanings and references.[5] According to Rowling, the main theme is death.[6] There are also many other themes in the series, such as prejudice and corruption.[7]
現在我想找到被引用在數據庫中的人該文件
我hav關於如何做到這一點的不同想法。但我不知道哪一個帶來最好的結果? 你更喜歡哪種方式?推薦?感謝
我分裂陣列中的文本,並在數據庫中經歷各
birthday
,並與JavaScript的text.search('25.03.1995')
尋找它時,有一擊,我經過的下一個領域如。text.searc('Harry')
。如果有幾個點擊,我找到了正確的記錄。- 利弊:易於實施,無需數據庫命令,純JavaScript
- 利弊:如果OCR犯了一個錯誤,並讀取如。
Harly
而不是Harry
我無法識別它。如果日期格式不同,則會發生相同的情況
首先,我通過數據庫的幫助來索引文本。接下來我採用類似於第一個例子的方法。而經過數據庫中的每個列,但現在數據庫
CONTAINS
- 優點:更快,更好的結果?
- 缺點:我需要一個良好的全文本搜索數據庫
我分裂了文本,並在數據庫列與SQL搜索每個單一的世界 -
LIKE
- 利弊:我不必索引文件,比包含更好?
- 缺點:沒有那麼快,作爲文本索引?
感謝您的幫助在這件事
也許某種模糊搜索可以幫助您克服OCR錯誤。試試這個例子 - http://glench.github.io/fuzzyset.js/ –