將單詞分解爲帶有紅寶石的字母

在我的語言中，有複合或複合字母，它們由多個字符組成，例如「ty」，「ny」，甚至「tty」和「nny」。我想寫一個Ruby方法（拼寫），該標記化字轉換爲字母，根據該字母：將單詞分解爲帶有紅寶石的字母

abc=[*%w{tty ccs lly ggy ssz nny dzs zzs sz zs cs gy ny dz ty ly q w r t z p l k j h g f d s x c v b n m y}.map{|z| [z,"c"]},*"eéuioöüóőúűáía".split(//).map{|z| [z,"v"]}].to_h

所得混雜鍵顯示現有字母/字母表的複合信，並且還示出了字母是輔音（「c」）和哪一個是元音（「v」），因爲稍後我想用這個散列將單詞分解成音節。複合詞的複合詞在複合詞的形成過程中，在詞語的共同邊界處形成的時候，當然不會被解決。

例子：

spell("csobolyó") => [ "cs", "o", "b", "o", "ly", "ó" ] 
spell("nyirettyű") => [ "ny", "i", "r", "e", "tty", "ű" ] 
spell("dzsesszmuzsikus") => [ "dzs", "e", "ssz", "m", "u", "zs", "i", "k", "u", "s" ]

來源

2017-09-20 Konstantin

你嘗試過這麼遠嗎？這將會非常複雜，所以如果你可以將它限制在一個特定的區域，你需要幫助，我想你會在這裏獲得更好的運氣。就目前而言，有很多邊緣案例說明，那些本不會說你的語言的人（可能是那些會說這種語言的人）無法通過...例如，如果我看到'dzs ''dzs「]'或'[」d「，」zs「]或'[」dz「，」s「]或'[」d「，」z「，「s」]'並且沒有詞典詞典（或者對這種語言有很多瞭解），我不認爲我們能夠確定哪一個是正確的 –

這就是爲什麼我排序字母表中的字母：if一個字母出現在前面，那麼它應該被識別而不是簡單的字母。當一個單詞包含「dzs」時，它應該被認爲是「dzs」而不是「d」和「zs」。在罕見的情況下，它會給出一些虛假的結果，但大多數分解將起作用。我不知道如何有效地做到這一點。也許有些內置字符串標記器或其他東西。 – Konstantin

你也許就能上手看着String#scan，這似乎是給你的例子拿得出手的成績：

"csobolyó".scan(Regexp.union(abc.keys)) 
# => ["cs", "o", "b", "o", "ly", "ó"] 
"nyirettyű".scan(Regexp.union(abc.keys)) 
# => ["ny", "i", "r", "e", "tty", "ű"] 
"dzsesszmuzsikus".scan(Regexp.union(abc.keys)) 
# => ["dzs", "e", "ssz", "m", "u", "zs", "i", "k", "u", "s"]

最後一種情況不匹配預期的輸出，但它匹配your statement in the comments

我整理了alphab中的字母et：如果一個字母出現的更早，那麼它應該被識別而不是簡單的字母。當一個單詞包含「dzs」時，應將其視爲「dzs」而不是「d」和「zs」

來源

2017-09-21 00:31:01

一般來說'Regexp.union'比'join（「|」）'安全，但在這種情況下可能並不重要，因爲我們只處理字符。 –

啊，是的好點，不要處理動態正則表達式，完全忘記'union'存在。更新 –

是的，它按預期工作，我在示例中輸入了錯誤的結果，現在我修復了它。 – Konstantin

我沒有使用您排序的偏好設置，而是使用了較高的字符單詞比低位字有更高的偏好。

def spell word 
    abc=[*%w{tty ccs lly ggy ssz nny dzs zzs sz zs cs gy ny dz ty ly q w r t z p l k j h g f d s x c v b n m y}.map{|z| [z,"c"]},*"eéuioöüóőúűáía".split(//).map{|z| [z,"v"]}].to_h 
    current_position = 0 
    maximum_current_position = 2 
    maximum_possible_position = word.length 
    split_word = [] 
    while current_position < maximum_possible_position do 
    current_word = set_current_word word, current_position, maximum_current_position 
    if abc[current_word] != nil 
     current_position, maximum_current_position = update_current_position_and_max_current_position current_position, maximum_current_position 
     split_word.push(current_word) 
    else 
     maximum_current_position = update_max_current_position maximum_current_position 
     current_word = set_current_word word, current_position, maximum_current_position 
     if abc[current_word] != nil 
     current_position, maximum_current_position = update_current_position_and_max_current_position current_position, maximum_current_position 
     split_word.push(current_word) 
     else 
     maximum_current_position = update_max_current_position maximum_current_position 
     current_word = set_current_word word, current_position, maximum_current_position 
     if abc[current_word] != nil 
      current_position, maximum_current_position = update_current_position_and_max_current_position current_position, maximum_current_position   
      split_word.push(current_word) 
     else 
      puts 'This word cannot be formed in the current language' 
      break 
     end 
     end 
    end 
    end 
    split_word 
end 

def update_max_current_position max_current_position 
    max_current_position = max_current_position - 1 
end 

def update_current_position_and_max_current_position current_position,max_current_position 
    current_position = max_current_position + 1 
    max_current_position = current_position + 2 
    return current_position, max_current_position 
end 

def set_current_word word, current_position, max_current_position 
    word[current_position..max_current_position] 
end 

puts "csobolyó => #{spell("csobolyó")}" 
puts "nyirettyű => #{spell("nyirettyű")}" 
puts "dzsesszmuzsikus => #{spell("dzsesszmuzsikus")}"

輸出

csobolyó => ["cs", "o", "b", "o", "ly", "ó"] 
nyirettyű => ["ny", "i", "r", "e", "tty", "ű"] 
dzsesszmuzsikus => ["dzs", "e", "ssz", "m", "u", "zs", "i", "k", "u", "s"]

來源

2017-09-21 00:31:08

同時，我設法寫一個奏效的方法，但比字符串＃掃描速度較慢的5倍：

abc=[*%w{tty ccs lly ggy ssz nny dzs zzs sz zs cs gy ny dz ty ly q w r t z p l k j h g f d s x c v b n m y}.map{|z| [z,"c"]},*"eéuioöüóőúűáía".split(//).map{|z| [z,"v"]}].to_h 

def spell(w,abc) 


    s=w.split(//) 
    p="" 
    t=[] 

    for i in 0..s.size-1 do 
     p << s[i] 
     if i>=s.size-2 then 

     if abc[p]!=nil then 
      t.push p 
      p="" 

     elsif abc[p[0..-2]]!=nil then 
      t.push p[0..-2] 
      p=p[-1] 

     elsif abc[p[0]]!=nil then 
      t.push p[0] 
      p=p[1..-1] 

     end 

     elsif p.size==3 then 
     if abc[p]!=nil then 
      t.push p 
      p="" 

     elsif abc[p[0..-2]]!=nil then 
      t.push p[0..-2] 
      p=p[-1] 

     elsif abc[p[0]]!=nil then 
      t.push p[0] 
      p=p[1..-1] 
     end 
     end 
    end 

    if p.size>0 then 
     if abc[p]!=nil then 
      t.push p 
      p="" 

     elsif abc[p[0..-2]]!=nil then 
      t.push p[0..-2] 
      p=p[-1] 
     end 
    end 

    if p.size>0 then 
     t.push p 
    end 
    return t 
end

來源

2017-09-21 17:22:46 Konstantin

將單詞分解爲帶有紅寶石的字母

回答

相關問題