嘗試驗證字符串以確定它是否包含一些3或更多的連續字符。檢查字符串在Ruby on Rails中是否具有連續字符
例子:
"11abcd$4567" => ['abcd', '4567']
試圖通過正則表達式來做到這一點,但它看起來更長的時間代碼:
(?!abc|bcd|cde|.....)
有沒有一種簡單的方法通過正則表達式來檢查的順序字符或者或平原紅寶石?
嘗試驗證字符串以確定它是否包含一些3或更多的連續字符。檢查字符串在Ruby on Rails中是否具有連續字符
例子:
"11abcd$4567" => ['abcd', '4567']
試圖通過正則表達式來做到這一點,但它看起來更長的時間代碼:
(?!abc|bcd|cde|.....)
有沒有一種簡單的方法通過正則表達式來檢查的順序字符或者或平原紅寶石?
正則表達式在這裏不適用。它們不夠靈活,以至於無法構建一般情況;而且Unicode是巨大的,構建一個響應任何遞增字符序列的正則表達式將列出數十或數十萬個案例中的每一個。這可以通過編程來完成,但這需要時間,而且會非常昂貴,而且記憶方面。
def find_streaks(string, min_length=3)
string # "xabcy"
.each_char # ['x', 'a', 'b', 'c', 'y']
.chunk_while { |a, b| a.succ == b } # [['x'], ['a', 'b', 'c'], ['y']]
.select { |c| c.size >= min_length } # [['a', 'b', 'c']]
.map(&:join) # ['abc']
end
我想這可能作爲一個polyfill工作...試試看吧?
# skip this thing on Ruby 2.3+, unneeded
unless Enumerable.instance_methods.include?(:chunk_while)
module Enumerable
def chunk_while # let's polyfill!
streak = nil # twofold purpose: init `streak` outside
# the block, and `nil` as flag to spot
# the first element.
Enumerator.new do |y| # `chunk_while` returns an `Enumerator`.
each do |element| # go through all the elements.
if streak # except on first element:
if yield streak[-1], element # give the previous element and current
# one to the comparator block.
# `streak` will always have an element.
streak << element # if the two elements are "similar",
# add this one to the streak;
else # otherwise
y.yield streak # output the current streak and
streak = [element] # start a new one with the current element.
end
else # for the first element, nothing to compare
streak = [element] # so just start the streak.
end
end
y.yield streak if streak # output the last streak;
# but if `streak` is `nil`, there were
# no elements, so no output.
end
end
end
end
嘛,DERP。在這裏,我去手工編寫這一切......當這本來是那麼容易,因爲這樣的:
unless Enumerable.instance_methods.include?(:chunk_while)
module Enumerable
def chunk_while
slice_when { |a, b| !yield a, b }
end
end
end
是啊,chunk_while
是slice_when
正好相反。甚至可以用原始代碼代替它,如.slice_when { |a, b| a.succ != b }
。有時我很慢。
這是我想出的一個有點解決方案。所以我試圖以奇怪的方式利用Ruby的內存引用。僞代碼:
ord
方法mem
變量爲空或mem
最後一個字符,轉換爲ASCII轉換字符到一個ascii
,等於當前字符減去1的ASCII,存儲它到MEMmem
變量具有3
連續字符,則將其存儲在arr
變量中。注:這裏的地方變得棘手,你可能會看到,因爲你的條件中明確規定「獲得3個或更多的連續字符」這可能是一個問題。所述「< <」一起使用時,所述mutates
對象string
我們從arr
可變推動。只要mem
變量不reinitialized
我們的賦值操作符,=,它會繼續變異你推到數組的字符串對象。
str = "11abcdefgh$4567"
arr = []
mem = ""
str.each_char do |s|
if mem.empty? || (mem[-1].ord == s.ord - 1)
mem << s
else
mem = ""
end
if mem.size == 3
arr << mem
end
end
puts arr
string.each_char.with_object([]) do |e, acc|
if acc.last && acc.last[-1] && e == acc.last[-1].succ
acc.last << e
else
acc << e
end
end.reject { |e| e.length < 3 }
這個版本可能很容易地適應任何字母表的工作:
"11абвгнabcd$4567".codepoints.each_with_object([]) do |e, acc|
e = e.chr(Encoding::UTF_8)
acc.last && acc.last[-1] && e == acc.last[-1].succ ? \
acc.last << e : acc << e
end.reject { |e| e.length < 3 }
#⇒ [
# [0] "абвг",
# [1] "abcd",
# [2] "4567"
# ]
標題和第一句話的問題狀態的任務是,以確定是否給定的字符串包含至少三個連續的字符(我假設爲ASCII)的順序(例如,"def"
和"!"#'
),即使認爲這似乎與示例衝突。回答這個問題的一個快速方法(可能不如其他方法那麼有效)如下。
代碼
def at_least_so_many_consecutive(str, min_run_size)
(32.chr..126.chr).each_cons(min_run_size).
map(&:join).
any? { |s| str.include?(s) }
end
例
str = "xabc$fghrtuvwx3!"
at_least_so_many_consecutive(str, 3)
#=> true
at_least_so_many_consecutive(str, 5)
#=> true
at_least_so_many_consecutive(str, 6)
#=> false
注:
(32.chr..126.chr).each_cons(min_run_size).map(&:join)
#=> [" !\"", "!\"#", "\"\#$", "\#$%", "$%&", "%&'", "&'(", "'()", "()*",
# ")*+", "*+,", "+,-", ",-.", "-./", "./0", "/01", "012", "123", "234",
# ...
# "QRS", "RST", "STU", "TUV", "UVW", "VWX", "WXY", "XYZ", "YZ[", "Z[\\",
# "[\\]", "\\]^", "]^_", "^_`", "_`a", "`ab", "abc", "bcd", "cde", "def",
# ...
# "opq", "pqr", "qrs", "rst", "stu", "tuv", "uvw", "vwx", "wxy", "xyz",
# "yz{", "z{|", "{|}", "|}~"]
另一種解釋
我最初的解釋是所有符合標準的三個或更多字符的子字符串都要被返回(儘管這與示例不一致)。不過,我會留下我的解決方案(下面)來解決這個問題。
代碼
def runs_of_min_size_or_more(str, min_run_size)
arr = str.chars
(arr.size-min_run_size+1).times.with_object([]) do |_,a|
run = arr.lazy.slice_when { |x,y| y != x.next }.first
a << run.join if run.size >= min_run_size
arr.shift
end
end
例
str = "xabc$fghrtuvwx3!"
runs_of_min_size_or_more str, 1
#=> ["x", "abc", "bc", "c", "$", "fgh", "gh", "h", "r", "tuvwx",
# "uvwx", "vwx", "wx", "x", "3"]
runs_of_min_size_or_more str, 2
#=> ["abc", "bc", "fgh", "gh", "tuvwx", "uvwx", "vwx", "wx"]
runs_of_min_size_or_more str, 3
# => ["abc", "fgh", "tuvwx", "uvwx", "vwx"]
runs_of_min_size_or_more str, 4
#=> ["tuvwx", "uvwx"]
runs_of_min_size_or_more str, 5
#=> ["tuvwx"]
runs_of_min_size_or_more str, 6
#=> []
說明
參見Enumerable#slice_when,其先製成其在紅寶石V2.2外觀。我製作了slice_when
惰性枚舉器,通過在其塊的末尾添加.first
,切片將在獲得第一個切片後終止。
出這到底是怎麼回事的最簡單的方法是插入代碼中的一些puts
語句,然後執行它。我還破
run = arr.lazy.slice_when { |x,y| y != x.next }.first
成
slice = arr.lazy.slice_when { |x,y| y != x.next }
run = slice.first
但由於slice
是一個枚舉我已經印刷slice.to_a
這是枚舉將產生元件的陣列。
def runs_of_min_size_or_more(str, min_run_size)
arr = str.chars
rv = (arr.size-min_run_size+1).times.with_object([]) do |_,a|
puts "arr=#{arr}"
puts " a=#{a}"
slice = arr.lazy.slice_when { |x,y| y != x.next }
puts " slice.to_a=#{slice.to_a}"
run = slice.first
puts " run=#{run}"
puts " reject '#{run.join}' because run.size=#{run.size} < #{min_run_size}" \
if run.size < min_run_size
a << run.join if run.size >= min_run_size
puts " run.join=#{run.join}" if run.size >= min_run_size
arr.shift
end
puts "arr=#{arr}"
rv
end
runs_of_min_size_or_more "xabc$rtuv3!", 2
arr=["x", "a", "b", "c", "$", "r", "t", "u", "v", "3", "!"]
a=[]
slice.to_a=[["x"], ["a", "b", "c"], ["$"], ["r"], ["t", "u", "v"], ["3"], ["!"]]
run=["x"]
reject 'x' because run.size=1 < 2
arr=["a", "b", "c", "$", "r", "t", "u", "v", "3", "!"]
a=[]
slice.to_a=[["a", "b", "c"], ["$"], ["r"], ["t", "u", "v"], ["3"], ["!"]]
run=["a", "b", "c"]
run.join=abc
arr=["b", "c", "$", "r", "t", "u", "v", "3", "!"]
a=["abc"]
slice.to_a=[["b", "c"], ["$"], ["r"], ["t", "u", "v"], ["3"], ["!"]]
run=["b", "c"]
run.join=bc
arr=["c", "$", "r", "t", "u", "v", "3", "!"]
a=["abc", "bc"]
slice.to_a=[["c"], ["$"], ["r"], ["t", "u", "v"], ["3"], ["!"]]
run=["c"]
reject 'c' because run.size=1 < 2
arr=["$", "r", "t", "u", "v", "3", "!"]
a=["abc", "bc"]
slice.to_a=[["$"], ["r"], ["t", "u", "v"], ["3"], ["!"]]
run=["$"]
reject '$' because run.size=1 < 2
arr=["r", "t", "u", "v", "3", "!"]
a=["abc", "bc"]
slice.to_a=[["r"], ["t", "u", "v"], ["3"], ["!"]]
run=["r"]
reject 'r' because run.size=1 < 2
arr=["t", "u", "v", "3", "!"]
a=["abc", "bc"]
slice.to_a=[["t", "u", "v"], ["3"], ["!"]]
run=["t", "u", "v"]
run.join=tuv
arr=["u", "v", "3", "!"]
a=["abc", "bc", "tuv"]
slice.to_a=[["u", "v"], ["3"], ["!"]]
run=["u", "v"]
run.join=uv
arr=["v", "3", "!"]
a=["abc", "bc", "tuv", "uv"]
slice.to_a=[["v"], ["3"], ["!"]]
run=["v"]
reject 'v' because run.size=1 < 2
arr=["3", "!"]
a=["abc", "bc", "tuv", "uv"]
slice.to_a=[["3"], ["!"]]
run=["3"]
reject '3' because run.size=1 < 2
arr=["!"]
#=> ["abc", "bc", "tuv", "uv"]
我一直在琢磨,這是否可以用正則表達式來完成。我找到了一種方法,但確實需要進行一些預處理。 (它也只適用於短字符串,如在評論中指出的。哦,有些人可能找到感興趣的方法。)
代碼
def runs_of_min_size_or_more(str, min_run_size)
arr = []
str.each_char.with_index.map { |c,i| (c.ord-i).chr }.
join.
scan(/(.)(?=(\1{#{min_run_size-1},}))/) do |m|
offset = Regexp.last_match.begin(0)-1
arr << m.join.gsub(/./) do |c|
offset += 1
(c.ord + offset).chr
end
end
arr
end
例
str = "xabc$fghrtuvwx3!"
min_run_size = 3
runs_of_min_size_or_more(str, min_run_size)
#=> ["abc", "fgh", "tuvwx", "uvwx", "vwx"]
說明
請注意,不使用返回值String#scan。 scan
的唯一功能是構建陣列arr
。
對於在例子中給出的str
和min_run_size
的值,其步驟如下。首先是預處理步驟。
arr = []
a = str.each_char.with_index.map { |c,i| (c.ord-i).chr }
#=> ["x", "`", "`", "`", " ", "a", "a", "a", "j", "k", "k", "k", "k", "k",
# "%", "\x12"]
b = a.join
#=> "x``` aaajkkkkk%\x12"
即scan
的說法的正則表達式是在自由間隔模式定義時執行以下操作:
r =/
(.) # match any character in capture group 1
(?= # begin a positive lookahead
( # begin capture group 2
\1 # match the content of capture group 1...
{#{min_run_size-1},} # at least min_run_size-1 times
) # end capture group 2
) # end positive lookahead
/x # free-spacing regex definition mode
#=>/
(.) # match any character in capture group 1
(?= # begin a positive lookahead
( # begin capture group 2
\1 # match the content of capture group 1...
{2,} # at least min_run_size-1 times
) # end capture group 2
) # end positive lookahead
/x
b.scan(r) do |m|
offset = Regexp.last_match.begin(0)-1
arr << m.join.gsub(/./) do |c|
offset += 1
(c.ord + offset).chr
end
end
#=> "x``` aaajkkkkk%\x12"
arr
#=> ["abc", "fgh", "tuvwx", "uvwx", "vwx"]
'runs_of_min_size_or_more(「0」* 50,3)'會殺死它。 'runs_of_min_size_or_more(「123456」,3)'發現四次運行。但那些東西很小而且可以解決。我的主要問題是,處理前和處理後的數量與解決方案本身相同或更大,因此您可以使用不合適的工具。讓我想起那個笑話,「你怎麼用伏特計和秒錶來測量建築物的高度?從屋頂上掉下電壓表,以及在地面上撞擊需要多長時間。」 :P當然,你可以用這種方式浪費電壓表,或者你可以拿一根弦。 – Amadan
@Amadan,對不那麼長的字符串來說會殺死它(通過引發'chr'的參數爲負數)。這應該發生在我身上,正如你說,這是可以解決的(可能是醜陋的方式)。我打算第二示例返回'[ 「123456」, 「23456」, 「3456」, 「456」]'。我們可能以不同的方式解釋這個問題正如我希望你猜測,我這樣做只是爲了看看我是如何接近能拿一個正則表達式的解決方案。我認爲這是一個火腿三明治和一個秒錶。 –
沒有必要解釋:OP有一個測試用例,具有輸入和預期的輸出。從來沒有聽說過火腿三明治,但我想伏特計的版本提供了一個更好的花園路徑...... – Amadan
使用包括方法。 「11abcd $ 4567'.include? 'ABC'。 https://ruby-doc.org/core-2.2.0/String.html#method-i-include-3F –
序列可以是任何字符串,不只是'abc'。它可以'減'等 – Arif
Umm選擇應該像。 a ='abc'.chars; a.select {| b | str.include?(b)} –