您必須首先對所有值進行歸一化。用正則表達式來做。
(def months ["JAN" "FEB" "MAR" "APR"
"MAY" "JUN" "JUL" "AUG"
"SEP" "OCT" "NOV" "DEC"])
(defn normalize-underscored [value]
(let [[_ text val month year]
(re-matches #"(.+?)_([\d.]+)_(\d+)/\d+/(\d{4})" value)]
[text
(Float/parseFloat val)
(months (dec (Long/parseLong month)))
year]))
(defn normalize-spaced [value]
(let [[_ text val month year]
(re-matches #"(.+?)\s([\d.]+)\s(\w{3})(\d{2,4})" value)]
[text (Float/parseFloat val) month
(if (== 2 (count year)) (str "20" year) year)]))
是如何規範化:
user> (normalize-underscored "XX_2.5_10/23/2015")
["XX" 2.5 "OCT" "2015"]
user> (normalize-spaced "XXX 1.000 OCT15")
["XXX" 1.0 "OCT" "2015"]
user> (normalize-spaced "ZZZ 3.500 JAN2016")
["ZZZ" 3.5 "JAN" "2016"]
,然後就比較標準化的版本:
(def underscored '("XXX_1_10/22/2015" "YYY_1.5_11/22/2015"
"XX_2.5_10/23/2015" "YY_5_11/26/2015"))
(def spaced #{"XXX 1.000 OCT15" "XX 2.500 OCT2015"
"ZZZ 3.500 JAN2016"})
(for [uv (map normalize-underscored underscored)
s spaced
:when (= uv (normalize-spaced s))]
s)
輸出:
("XXX 1.000 OCT15" "XX 2.500 OCT2015")
或更好的格式化結果到更一致的形式,如這樣的:
(map (partial apply format "%s %.3f %s%s")
(keep (set (map normalize-spaced spaced))
(map normalize-underscored underscored)))
輸出:
("XXX 1.000 OCT2015" "XX 2.500 OCT2015")
你的第二個數據格式'{...}'是*地圖*。當然應該是'#{}' - 一個* set *。 – Thumbnail
我無法更改數據結構中的格式。我測試過使用(保持#set列表)比較兩者的方法,並且它返回通用值。但格式化日期以使它們相似是我面臨的問題。 – Sri
隨時添加該代碼。它使得願意回答的人更容易建立它 – cfrick