2014-10-29 30 views
0

這是我的情況。我有一個問題,我需要過濾用戶可能從Word或Excel文檔粘貼的無效字符。C++ - 當用戶粘貼網格時刪除無效字符

這是我正在做的事情。

首先,我想轉換任何Unicode字符的ASCII

extern "C" COMMON_STRING_FUNCTIONS long ConvertUnicodeToAscii(wchar_t * pwcUnicodeString, char* &pszAsciiString) 
{ 
    int nBufLen = WideCharToMultiByte(CP_ACP, 0, pwcUnicodeString, -1, NULL, 0, NULL, NULL)+1; 
    pszAsciiString = new char[nBufLen]; 
    WideCharToMultiByte(CP_ACP, 0, pwcUnicodeString, -1, pszAsciiString, nBufLen, NULL, NULL); 
    return nBufLen; 
} 

接下來,我過濾掉不具有31和127

String __fastcall TMainForm::filterInput(String l_sConversion) 
{ 
    // Used to store every character that was stripped out. 
    String filterChars = ""; 

    // Not Used. We never received the whitelist 
    String l_SWhiteList = ""; 

    // Our String without the invalid characters. 
    AnsiString l_stempString; 

    // convert the string into an array of chars 
    wchar_t* outputChars = l_sConversion.w_str(); 
    char * pszOutputString = NULL; 

    //convert any unicode characters to ASCII 
    ConvertUnicodeToAscii(outputChars, pszOutputString); 

    l_stempString = (AnsiString)pszOutputString; 

    //We're going backwards since we are removing characters which changes the length and position. 
    for (int i = l_stempString.Length(); i > 0; i--) 
    { 
     char l_sCurrentChar = l_stempString[i]; 

     //If we don't have a valid character, filter it out of the string. 
     if (((unsigned int)l_sCurrentChar < 31) ||((unsigned int)l_sCurrentChar > 127)) 
     { 
      String l_sSecondHalf = ""; 
      String l_sFirstHalf = ""; 
      l_sSecondHalf = l_stempString.SubString(i + 1, l_stempString.Length() - i); 
      l_sFirstHalf = l_stempString.SubString(0, i - 1); 
      l_stempString = l_sFirstHalf + l_sSecondHalf; 
      filterChars += "\'" + ((String)(unsigned int)(l_sCurrentChar)) + "\' "; 
     } 
    } 

    if (filterChars.Length() > 0) 
    { 
     LogInformation(__LINE__, __FUNC__, Utilities::LOG_CATEGORY_GENERAL, "The Following ASCII Values were filtered from the string: " + filterChars); 
    } 

    // Delete the char* to avoid memory leaks. 
    delete [] pszOutputString; 
    return l_stempString; 
} 

之間的值現在這個任意字符似乎工作,除非,當你嘗試從word文檔複製和過去的項目符號。

o Bullet1:
▪subbullet1。

你會得到這樣的事情

oBullet1?subbullet1。

我的過濾器函數在onchange事件上調用。

項目符號被替換爲值o和一個問號。

我在做什麼錯,是否有更好的方法來嘗試這樣做。

我正在使用C++ builder XE5,所以請不要使用Visual C++解決方案。

+0

'CP_ACP'並不代表ASCII,它代表了操作系統的當前區域,這可能是任何語言。 ASCII本身是代碼頁20127。當您只需使用'AnsiStringT <20127>'來定義自己的轉換函數也是多餘的,並讓RTL爲您處理轉換。 – 2014-10-29 17:02:46

回答

0

當你執行轉換爲ASCII(這是實際上轉換爲ASCII,順便說一句),不是由目標代碼頁支持Unicode字符丟失 - 要麼下降,與?代替,或用緊密替代近似值 - 所以它們不適用於您的掃描循環。您根本不應該進行轉換,而是按原樣掃描源Unicode數據。

嘗試更多的東西是這樣的:

#include <System.Character.hpp> 

String __fastcall TMainForm::filterInput(String l_sConversion) 
{ 
    // Used to store every character sequence that was stripped out. 
    String filterChars; 

    // Not Used. We never received the whitelist 
    String l_SWhiteList; 

    // Our String without the invalid sequences. 
    String l_stempString; 

    int numChars; 
    for (int i = 1; i <= l_sConversion.Length(); i += numChars) 
    { 
     UCS4Char ch = TCharacter::ConvertToUtf32(l_sConversion, i, numChars); 
     String seq = l_sConversion.SubString(i, numChars); 

     //If we don't have a valid codepoint, filter it out of the string. 
     if ((ch <= 31) || (ch >= 127)) 
      filterChars += (_D("\'") + seq + _D("\' ")); 
     else 
      l_stempString += seq; 
    } 

    if (!filterChars.IsEmpty()) 
    { 
     LogInformation(__LINE__, __FUNC__, Utilities::LOG_CATEGORY_GENERAL, _D("The Following Values were filtered from the string: ") + filterChars); 
    } 

    return l_stempString; 
} 
+0

感謝您的幫助。看起來你寫的東西是剝離大多數子彈,但清晰的子彈仍然被轉換爲「o」。這只是Windows複製/粘貼行爲。 – themaniac27 2014-10-30 11:45:16

+0

此代碼應該剝離出所有項目符號,因爲ASCII中沒有項目符號。 – 2014-10-30 14:34:17