2012-12-01 26 views
2

作爲找到字符串中相同字符的運行的my earlier question的後續,我還想找到一個函數算法來查找長度大於2的所有子字符串,或字母串([Char])中降序的字母或數字序列(例如「defgh」,「34567」,「XYZ」,「fedcba」,「NMLK」,9876「等)我正在考慮的子字符串爲A..Z,a..z,0..9,以及它們的降序。返回值應該是一個(從零開始的偏移量,長度)對的列表。我正在翻譯JavaScript中的「zxcvbn」密碼強度算法(包含命令式代碼)到Scala。我想盡可能保持我的代碼爲純粹的功能,對於所有的你在函數式編程風格中給出的寫作原因。我的代碼是用Scala編寫的,但我可以在Clojure,F#,Haskell或者僞代碼中任何一種中轉換算法。例如:qweABCD13987將返回[(3,4),(9,3)]所有子字符串是使用函數式編程的字符序列

我寫了一個相當怪異的函數,當我再次訪問我的工作計算機時,我會發布,但我確信存在更優雅的解決方案。

再一次,謝謝。

回答

0

我想這個問題的一個很好的解決方案比起初看起來真的更復雜。 我不是Scala Pro,所以我的解決方案肯定不是最佳和好的,但它可能會給你一些想法。

基本的想法是計算兩個連續字符之間的差異,之後它不幸地變得有點混亂。問我是否有些代碼不清楚!

object Sequences { 

    val s = "qweABCD13987"       

    val pairs = (s zip s.tail) toList // if s might be empty, add a check here    
    // = List((q,w), (w,e), (e,A), (A,B), (B,C), (C,D), (D,1), (1,3), (3,9), (9,8), (8,7)) 

    // assuming all characters are either letters or digits 
    val diff = pairs map {case (t1, t2) => 
    if (t1.isLetter^t2.isLetter) 0 else t1 - t2} // xor could also be replaced by != 
    // = List(-6, 18, 36, -1, -1, -1, 19, -2, -6, 1, 1) 

    /** 
    * 
    * @param xs A list indicating the differences between consecutive characters 
    * @param current triple: (start index of the current sequence; 
    *       number of current elements in the sequence; 
    *       number indicating the direction i.e. -1 = downwards, 1 = upwards, 0 = doesn't matter) 
    * @return A list of triples similar to the argument 
    */ 
    def sequences(xs: Seq[Int], current: (Int, Int, Int) = (0, 1, 0)): List[(Int, Int, Int)] = xs match { 
    case Nil => current :: Nil 
    case (1 :: ys) => 
     if (current._3 != -1) 
     sequences(ys, (current._1, current._2 + 1, 1)) 
     else 
     current :: sequences(ys, (current._1 + current._2 - 1, 2, 1)) // "recompute" the current index 
    case (-1 :: ys) => 
     if (current._3 != 1) 
     sequences(ys, (current._1, current._2 + 1, -1)) 
     else 
     current :: sequences(ys, (current._1 + current._2 - 1, 2, -1)) 
    case (_ :: ys) => 
     current :: sequences(ys, (current._1 + current._2, 1, 0)) 
    }            

    sequences(diff) filter (_._2 > 1) map (t => (t._1, t._2)) 
} 
+0

該算法需要一個小的改變,只允許字母和數字序列:在'diff'方法中,如果t._1或t_.2不是字母或數字,則返回0。 – Ralph

+0

的確如此,感謝評論。我相應地調整了方法! – SHildebrandt

0

將問題分成幾個較小的子問題總是最好的。我在Haskell中寫了一個解決方案,對我來說更簡單。它使用懶惰列表,但我想你可以使用流或通過使主函數尾遞歸併將中間結果作爲參數傳遞給Scala。

-- Mark all subsequences whose adjacent elements satisfy 
-- the given predicate. Includes subsequences of length 1. 
sequences :: (Eq a) => (a -> a -> Bool) -> [a] -> [(Int,Int)] 
sequences p [] = [] 
sequences p (x:xs) = seq x xs 0 0 
    where 
    -- arguments: previous char, current tail sequence, 
    -- last asc. start offset of a valid subsequence, current offset 
    seq _ [] lastOffs curOffs = [(lastOffs, curOffs - lastOffs)] 
    seq x (x':xs) lastOffs curOffs 
     | p x x' -- predicate matches - we're extending current subsequence 
      = seq x' xs lastOffs curOffs' 
     | otherwise -- output the currently marked subsequence and start a new one 
      = (lastOffs, curOffs - lastOffs) : seq x' xs curOffs curOffs' 
     where 
     curOffs' = curOffs + 1 

-- Marks ascending subsequences. 
asc :: (Enum a, Eq a) => [a] -> [(Int,Int)] 
asc = sequences (\x y -> succ x == y) 

-- Marks descending subsequences. 
desc :: (Enum a, Eq a) => [a] -> [(Int,Int)] 
desc = sequences (\x y -> pred x == y) 

-- Returns True for subsequences of length at least 2. 
validRange :: (Int, Int) -> Bool 
validRange (offs, len) = len >= 2 

-- Find all both ascending and descending subsequences of the 
-- proper length. 
combined :: (Enum a, Eq a) => [a] -> [(Int,Int)] 
combined xs = filter validRange (asc xs) ++ filter validRange (desc xs) 

-- test: 
main = print $ combined "qweABCD13987" 
+0

該算法需要一個小的改變,只允許字母和數字序列:在'asc'和'desc'函數中,如果其中任何一個參數不是字母或數字,則返回False。 – Ralph

0

這是我用Clojure近似:

我們可以將輸入的字符串,所以我們可以將您的previous algorithm找到一個解決方案。算術不會是最高性能的,但我認爲你會有一個更抽象和可讀的代碼。

的例子字符串可以通過以下方式轉變:

user => (find-serials "qweABCD13987") 
(0 1 2 # # # # 7 8 # # #) 

重用previous function "find-runs"

user => (find-runs (find-serials "qweABCD13987")) 
([3 4] [9 3]) 

最後的代碼如下所示:

(defn find-runs [s] 
    (let [ls (map count (partition-by identity s))] 
    (filter #(>= (% 1) 3) 
      (map vector (reductions + 0 ls) ls)))) 

(def pad "#") 

(defn inc-or-dec? [a b] 
    (= (Math/abs (- (int a) (int b))) 1)) 

(defn serial? [a b c] 
    (or (inc-or-dec? a b) (inc-or-dec? b c))) 

(defn find-serials [s] 
    (map-indexed (fn [x [a b c]] (if (serial? a b c) pad x)) 
     (partition 3 1 (concat pad s pad)))) 

find-serials創建一個3單元滑動窗口並應用serial?來檢測作爲序列開始/中間/結束的單元格。該字符串可方便地填充,因此窗口始終居中放置在原始字符上方。

相關問題