加速emacs中的字符串匹配

我想在emacs中實現vim commandT plugin。此代碼大多是matcher的譯文。加速emacs中的字符串匹配

我在這裏有一些elisp，在我的上網本上使用還是太慢 - 我該如何加快速度？

(eval-when-compile (require 'cl)) 
(defun commandT-fuzzy-match (choices search-string) 
    (sort (loop for choice in choices 
       for score = (commandT-fuzzy-score choice search-string (commandT-max-score-per-char choice search-string)) 
       if (> score 0.0) collect (list score choice)) 
     #'(lambda (a b) (> (first a) (first b))) 
     )) 

(defun* commandT-fuzzy-score (choice search-string &optional (score-per-char (commandT-max-score-per-char choice search-string)) (choice-pointer 0) (last-found nil)) 
    (condition-case error 
     (loop for search-char across search-string 
      sum (loop until (char-equal search-char (elt choice choice-pointer)) 
         do (incf choice-pointer) 
         finally return (let ((factor (cond (last-found (* 0.75 (/ 1.0 (- choice-pointer last-found)))) 
                 (t 1.0)))) 
             (setq last-found choice-pointer) 
             (max (commandT-fuzzy-score choice search-string score-per-char (1+ choice-pointer) last-found) 
              (* factor score-per-char))))) 
    (args-out-of-range 0.0) ; end of string hit without match found. 
    )) 

(defun commandT-max-score-per-char (choice search-string) 
    (/ (+ (/ 1.0 (length choice)) (/ 1.0 (length search-string))) 2))

確保編譯該部分，因爲這已經有很大幫助。而基準：

(let ((choices (split-string (shell-command-to-string "curl http://sprunge.us/FcEL") "\n"))) 
    (benchmark-run-compiled 10 
     (commandT-fuzzy-match choices "az")))

來源

2012-10-26 Reactormonk

一些解釋代碼應該這樣做，或者一些評論會有幫助。 – Thomas

聽起來很像「iswitchb」和/或「ido」提供的東西。如果Textmate從Emacs那裏得到它，我不會感到驚訝...... http://emacswiki.org/emacs/IswitchBuffers – tripleee

[Here]（http://www.emacswiki.org/emacs/Icicles_-_Fuzzy_Completion）是一個描述Emacs Lisp中的各種模糊匹配，並引用了實現代碼。 – Drew

這裏有一些微的優化，你可以嘗試：

使用car-less-than-car，而不是你的lambda表達式。這沒有明顯的效果，因爲時間不在sort中，而是在commandT-fuzzy-score中。
使用defun而不是defun*：那些具有非零缺省值的可選參數具有不可忽略的隱藏成本。這將GC成本降低了近一半（並且您在GC中花費的時間超過了10％）。
（* 0.75（/1.0XXX））等於（/ 0.75XXX）。
使用eq而不是char-equal（將行爲更改爲始終區分大小寫，tho）。這會造成相當大的差異。使用aref代替elt。
我不明白你爲什麼在遞歸調用中通過last-found，所以我顯然不完全理解你的算法在做什麼。但假設這是一個錯誤，你可以把它變成一個局部變量而不是作爲參數傳遞它。這可以節省您的時間。
我不明白你爲什麼要發現你發現的每個search-char的遞歸調用，而不是隻爲第一個。另一種看待這個問題的方法是，你的max比較「單個字符評分」和「整個搜索字符串評分」，這看起來很奇怪。如果您更改代碼以在(1+ first-found)的遞歸調用下在兩個loop之外執行max，那麼在我的測試案例中，這會加快4倍。
乘以score-per-char可以移到循環之外（對於原始算法，這似乎不是真的）。

此外，作爲的elisp在Emacs實現是相當緩慢的，所以你往往最好使用「大元」，以花更少的時間解釋的elisp（字節級）代碼和運行更多的時間C代碼。下面是例如替代實現，用正則表達式模式MACHING做內環（不是你原來的算法，但一個我移動max環外後得到的）：什麼是

(defun commandT-fuzzy-match-re (choices search-string) 
    (let ((search-re (regexp-quote (substring search-string 0 1))) 
     (i 1)) 
    (while (< i (length search-string)) 
     (setq search-re (concat search-re 
           (let ((c (aref search-string i))) 
           (format "[^%c]*\\(%s\\)" 
             c (regexp-quote (string c)))))) 
     (setq i (1+ i))) 

    (sort 
    (delq nil 
      (mapcar (lambda (choice) 
        (let ((start 0) 
          (best 0.0)) 
         (while (string-match search-re choice start) 
         (let ((last-found (match-beginning 0))) 
          (setq start (1+ last-found)) 
          (let ((score 1.0) 
           (i 1) 
           (choice-pointer nil)) 
          (while (setq choice-pointer (match-beginning i)) 
           (setq i (1+ i)) 
           (setq score (+ score (/ 0.75 (- choice-pointer last-found)))) 
           (setq last-found choice-pointer)) 
          (setq best (max best score))))) 
         (when (> best 0.0) 
         (list (* (commandT-max-score-per-char 
            choice search-string) 
            best) 
           choice)))) 
        choices)) 
    #'car-less-than-car)))

來源

2012-10-26 15:01:24 Stefan

「last-found」用於計算與前一個匹配的距離 - 兩個匹配之間的距離越長，得分越小。 – Reactormonk

事實上，你使用它的方式，它是與先前找到的*字符*的距離，而不是先前的匹配（因爲在完成匹配之前你會回溯）。這說，我不明白你爲什麼會在意與前一場比賽的距離。 – Stefan

因爲[參考實現]（https://github.com/wincent/Command-T/blob/master/ruby/command-t/match.c#L97-99）。 – Reactormonk

加速emacs中的字符串匹配

回答

相關問題