從字符串中刪除Unicode字符

-2

下面是我的代碼段，我已經能夠刪除一些轉義字符。但問題是我無法從ParseLine（）讀取的給定字符串NewOutput中刪除unicode字符。另外我想要統計包含unicode的行數。從字符串中刪除Unicode字符

例如字符串NewOutput有3條線爲：

@ KayKay121拖着我到圖書館。現在我必須提高工作效率\ udc3d \ udc94 https://t.co/HjZR3d5QaQ（時間戳：Thu Oct 29 17:51:50 +0000 2015）

6A決定推遲最後的投票，直到執行委員會聽取上訴爲止。似乎設定了：7個地區。（時間戳：Thu Oct 29 17:51:51 +0000 2015）

@i_am_sknapp謝謝你關注我們，Seth。（時間戳：Thu Oct 29 18:10:49 +0000 2015）

這對我很有幫助:)謝謝！

if (readtweetfile.is_open()) 
{ 
    while (!readtweetfile.eof()) 
    { 
     getline(readtweetfile,output); 
     ParseLine(output,NewOutput); 
     std::string unicod_string = output; 

     if(NewOutput!=" ") 
     { 
      std::string firstChar="Check"; 
      std::string secondChar; 
      std::string checkingChar=""; 
      for (std::string::iterator it = NewOutput.begin(), end = NewOutput.end(); it != end; ++it) 
      { 
       if(firstChar=="Check") 
        firstChar = *it; 
       else 
       { 
        secondChar = *it; 
        checkingChar = firstChar + secondChar; 

        if(checkingChar=="\\\"") 
        { 
         writetweetfile << secondChar ; 
         firstChar="Check"; 
         continue; 
        } 
        else if(checkingChar=="\\/") 
        { 
         writetweetfile << secondChar; 
         firstChar="Check"; 
         continue; 
        } 
        else if(checkingChar=="\\\'") 
        { 
         writetweetfile << secondChar; 
         firstChar="Check"; 
         continue; 
        } 
        else if(checkingChar=="\\\n") 
        { 
         writetweetfile << " " ; 
         firstChar="Check"; 
         continue; 
        } 
        else if(checkingChar=="\\\t") 
        { 
         writetweetfile << " "; 
         firstChar="Check"; 
         continue; 
        } 
        else if(checkingChar=="\\ ") 
        { 
         writetweetfile << " "; 
         firstChar="Check"; 
         continue; 
        } 
        else if(checkingChar=="\\\\") 
        { 
         writetweetfile << secondChar; 
         firstChar="Check"; 
         continue; 
        } 
        else if(checkingChar=="\\u") 
        { 
         writetweetfile << "unicode"; 
         firstChar="Check"; 
         continue; 
        } 

        writetweetfile << firstChar; 
        firstChar=secondChar; 
       } 
      } 
     } 
     writetweetfile << std::endl; 
    } 
}

來源

2015-11-08 shahganesh

你從哪裏得到這些字符串？該文件是以某種文件格式保存的嗎？例如。如果文件是JSON，只需使用JSON解析器，它將解碼這些轉義。其次，'\ ud83d \ udc94'是一個單個字符的代理對（可能是表情符號）。 – roeland

那麼實際上不知道你想什麼，輸出就爲您3個樣品 - 我想出了這個

\\(u|U)[a-zA-Z0-9]{4}|\\|\t|\n

這將發現Unicode和轉義字符

如果您需要有些不同，用更多的例子來修改這個問題，更重要的是，你想要完成的輸出是什麼。

來源

2015-11-09 21:19:09 Nefariis

從字符串中刪除Unicode字符

回答

相關問題