從UTF字符串中移除非Ansi字符並保留其他

我們擁有一個java庫，它以UTF8字符串作爲輸入。但是如果輸入中有一個非ansi字符的char，那麼lib可能會崩潰。所以，我們希望從字符串中刪除所有非ansi字符。但如何在java中做到這一點？從UTF字符串中移除非Ansi字符並保留其他

謝謝，

來源

2013-06-24 Meilun Sheng

你有什麼到目前爲止已經試過？ –

修復你的庫。這將有很大幫助 – Jayan

看看String.codePointAt（索引）。這可以給你一個給定角色的Unicode代碼點，並從那裏你可以刪除你的範圍之外的那些代碼點。

如何處理字符已被刪除的事實在您的盡頭，但請記住，您將發送到庫的字符串不一定與客戶端提供的字符串相同。這可能會或可能不會導致問題。

我不確定你在這裏是什麼意思。你的意思是人們通常稱之爲ANSI的Windows 1252字符編碼？這不是ASCII碼，也不是IS0-8859-1，所以請確保你的代碼頁是正確的。

來源

2013-06-24 12:44:52

試試這個，我把這個從here所以沒有測試它

// Create a encoder and decoder for the character encoding 
Charset charset = Charset.forName("US-ASCII"); 
CharsetDecoder decoder = charset.newDecoder(); 
CharsetEncoder encoder = charset.newEncoder(); 

// This line is the key to removing "unmappable" characters. 
encoder.onUnmappableCharacter(CodingErrorAction.IGNORE); 
String result = inString; 

try { 
    // Convert a string to bytes in a ByteBuffer 
    ByteBuffer bbuf = encoder.encode(CharBuffer.wrap(inString)); 

    // Convert bytes in a ByteBuffer to a character ByteBuffer and then to a string. 
    CharBuffer cbuf = decoder.decode(bbuf); 
    result = cbuf.toString(); 
} catch (CharacterCodingException cce) { 
    String errorMessage = "Exception during character encoding/decoding: " + cce.getMessage(); 
    cce.printStackTrace() 
}

來源

2013-06-24 12:46:01

從UTF字符串中移除非Ansi字符並保留其他

回答

相關問題