2017-02-28 34 views

回答

5

的字節239, 191, 191以UTF-8進行解碼,以Unicode的碼點U+FFFF

iex(1)> <<x::utf8>> = <<239, 191, 191>> 
<<239, 191, 191>> 
iex(2)> x 
65535 
iex(3)> x == 0xFFFF 
true 

其是Unicode Non-CharacterString.valid?/1 has a list of all such characters並在遇到任何那些的返回false


我找不到任何功能靈藥,只有檢查UTF-8有效性,並跳過非字符檢查,但它是微不足道的寫一個:

defmodule A do 
    def valid_utf8?(<<_::utf8, rest::binary>>), do: valid_utf8?(rest) 
    def valid_utf8?(<<>>), do: true 
    def valid_utf8?(_), do: false 
end 

for binary <- [<<0>>, <<239, 191, 191>>, <<128>>] do 
    IO.inspect {binary, String.valid?(binary), A.valid_utf8?(binary)} 
end 

輸出:

{<<0>>, true, true} 
{<<239, 191, 191>>, false, true} 
{<<128>>, false, false}