2012-01-06 34 views
0

所以,我的代碼只是使用HTML標籤創建一個字符串的內聯差異(在每個單詞的基礎上),因此CSS可以隱藏/顯示被刪除/添加。 在我的測試中,我使用()添加和{}刪除。雖然做了一些字符串maniplation,我來了一些奇怪的編碼

這裏是我的文字 輸入:

"e&nbsp;<b><u>Zerg</u></b>&nbsp;a" 
"e Zerg a" 

輸出:

"e(?)(\240){&nbsp;<b>}{<u>}Zerg(?)(\240){</u>}{</b>}{&nbsp;}a" 

現在,我不都改變編碼做任何事情,所以...我真的很困惑至於一個問號和\ 240如何到達那裏。 o.o

這是什麼樣的編碼?

我使用Ruby 1.8.7

發現的問題根源。它發生在我的新字符串轉換爲DIFF :: LCS陣列來使用:

該代碼:

def self.convert_html_string_to_html_array(str) 
=begin 
    Things like &nbsp (and other char codes), and tags need to be considered the same element 
    also handles the decision to diff per char or per word 

    also need to take into consideration javascript and css that might be in the middle of a selection 
=end 
    result = Array.new 
    compare_words = str.has_at_least_one_word? 
    i = 0 
    while i < str.length do 
     cur_char = str[i..i] 
     case cur_char 
     when "&" 
     # for this we have two situations, a stray char code, and a char code preceeding a tag 
     next_index = str.index(";", i) 
     case str[next_index + 1..next_index + 1] # check to see if tag 
     when "<" 
      next_index = str.index(">", i) 
     end 
     result << str[i..next_index] 
     i = next_index 
     when "<" 
     next_index = str.index(">", i) 
     result << str[i..next_index] 
     i = next_index 
     when " " 
     result << cur_char 
     else 
     if compare_words 
      # in here we need to check the above rules again, cause tags can be touching regular text 
      next_index = i + 1 
      next_index = str.index(" ", next_index) 
      next_index = str.length if next_index.nil? 
      next_index -= 1 

      if i < str.length and str[i..next_index].include?("<") # beginning of a tag 
      next_index = str.index(">", i) 
      end 

      result << str[i..next_index] # don't want to include the space 
      i = next_index 
     else 
      result << cur_char 
     end 
     end 
     i += 1 
    end 

    return result # removes the trailing empty string 
    end 

澄清,這一點:

'e Zerg a' 

被變成這樣的:

[ 
    [0] "e", 
    [1] "\302", 
    [2] "\240", 
    [3] "Z", 
    [4] "e", 
    [5] "r", 
    [6] "g", 
    [7] "\302", 
    [8] "\240", 
    [9] "a" 
] 

回答

0

更新到1.9.2或以上(我建議使用RVM),1.8.7有一些奇怪的東西用繩子怎麼回事...

+0

lol http://stackoverflow.com/questions/8761092/trying-to-upgrade-from-from-ruby-1-8-7-to-1-9-2-while-still-using-rails- 2-3-8 workin on it = p – NullVoxPopuli 2012-01-06 17:18:23

+0

我只是假設1.9.2解決了這個問題,因爲這是特定於超過8位的unicode字符。 – NullVoxPopuli 2012-01-06 17:45:15