解碼在Java中

字符串如何正確使用Java解碼在Java中

http%3A//www.google.ru/search%3Fhl%3Dru%26q%3Dla+mer+powder%26btnG%3D%u0420%A0%u0421%u045F%u0420%A0%u0421%u2022%u0420%A0%u0421%u2018%u0420%u040E%u0420%u0453%u0420%A0%u0421%u201D+%u0420%A0%u0420%u2020+Google%26lr%3D%26rlz%3D1I7SKPT_ru

解碼以下字符串當我使用URLDecoder.decode（）我下面的錯誤

java.lang.IllegalArgumentException: URLDecoder: Illegal hex characters in escape (%) pattern - For input string: "u0"

謝謝，戴夫

來源

2011-03-23 Dave

該網址未正確編碼以開始。 – 2011-03-23 16:32:32

@Johan如果它是較大的URL的一部分（如http://foo.com/?url=<上面的字符串），它可能是，但否則，同意 – 2011-03-23 16:35:17

@Johan，爲什麼不呢？ @Daniel，完全是我的想法：http://www.google.com/search?q=http%3A//www.google.ru/search%3Fhl%3Dru%26q%3Dla+mer+powder%26btnG%3D% u0420％A0％u0421％u045F％u0420％A0％u0421％U2022％u0420％A0％u0421％u2018％u0420％u040E％u0420％u0453％u0420％A0％u0421％U201D +％u0420％A0％u0420％u2020 +谷歌％ 26lr％3D％26rlz％3D1I7SKPT_ru – OscarRyz 2011-03-23 16:35:35

根據Wikipedia，「存在Unicode字符的非標準編碼：%uxxxx，其中xxxx是Unicode va略」。繼續：「此行爲未由任何RFC指定，並且已被W3C拒絕」。

您的URL包含這些標記，並且Java URLDecoder實現不支持這些標記。

來源

2011-03-23 16:36:45

%uXXXX編碼是非標準的，實際上被W3C拒絕，所以很自然，URLDecoder並不理解它。

您可以製作一個小函數，它將通過在您編碼的字符串中將%uXXYY替換爲%XX%YY來修復它。然後你可以正常地處理和解碼固定字符串。

來源

2011-03-23 16:39:00 vartec

我們從Vartec的解決方案開始，但發現了其他問題。此解決方案適用於UTF-16，但可以更改爲返回UTF-8。所有被留下爲清楚起見替換，你可以閱讀更多的http://www.cogniteam.com/wiki/index.php?title=DecodeEncodeJavaScript

static public String unescape(String escaped) throws UnsupportedEncodingException 
{ 
    // This code is needed so that the UTF-16 won't be malformed 
    String str = escaped.replaceAll("%0", "%u000"); 
    str = str.replaceAll("%1", "%u001"); 
    str = str.replaceAll("%2", "%u002"); 
    str = str.replaceAll("%3", "%u003"); 
    str = str.replaceAll("%4", "%u004"); 
    str = str.replaceAll("%5", "%u005"); 
    str = str.replaceAll("%6", "%u006"); 
    str = str.replaceAll("%7", "%u007"); 
    str = str.replaceAll("%8", "%u008"); 
    str = str.replaceAll("%9", "%u009"); 
    str = str.replaceAll("%A", "%u00A"); 
    str = str.replaceAll("%B", "%u00B"); 
    str = str.replaceAll("%C", "%u00C"); 
    str = str.replaceAll("%D", "%u00D"); 
    str = str.replaceAll("%E", "%u00E"); 
    str = str.replaceAll("%F", "%u00F"); 

    // Here we split the 4 byte to 2 byte, so that decode won't fail 
    String [] arr = str.split("%u"); 
    Vector<String> vec = new Vector<String>(); 
    if(!arr[0].isEmpty()) 
    { 
     vec.add(arr[0]); 
    } 
    for (int i = 1 ; i < arr.length ; i++) { 
     if(!arr[i].isEmpty()) 
     { 
      vec.add("%"+arr[i].substring(0, 2)); 
      vec.add("%"+arr[i].substring(2)); 
     } 
    } 
    str = ""; 
    for (String string : vec) { 
     str += string; 
    } 
    // Here we return the decoded string 
    return URLDecoder.decode(str,"UTF-16"); 
}

來源

2011-07-25 09:22:59 ariy

後有過在由@ariy提出的解決方案我創建了一個基於Java的解決方案，也是針對具有編碼的字符彈性很好看被分成兩部分（即編碼字符的一半缺失）。這發生在我的用例中，我需要解碼有時在2000字符長度切碎的長URL。請參閱What is the maximum length of a URL in different browsers?

public class Utils { 

    private static Pattern validStandard  = Pattern.compile("%([0-9A-Fa-f]{2})"); 
    private static Pattern choppedStandard = Pattern.compile("%[0-9A-Fa-f]{0,1}$"); 
    private static Pattern validNonStandard = Pattern.compile("%u([0-9A-Fa-f][0-9A-Fa-f])([0-9A-Fa-f][0-9A-Fa-f])"); 
    private static Pattern choppedNonStandard = Pattern.compile("%u[0-9A-Fa-f]{0,3}$"); 

    public static String resilientUrlDecode(String input) { 
     String cookedInput = input; 

     if (cookedInput.indexOf('%') > -1) { 
      // Transform all existing UTF-8 standard into UTF-16 standard. 
      cookedInput = validStandard.matcher(cookedInput).replaceAll("%00%$1"); 

      // Discard chopped encoded char at the end of the line (there is no way to know what it was) 
      cookedInput = choppedStandard.matcher(cookedInput).replaceAll(""); 

      // Handle non standard (rejected by W3C) encoding that is used anyway by some 
      // See: https://stackoverflow.com/a/5408655/114196 
      if (cookedInput.contains("%u")) { 
       // Transform all existing non standard into UTF-16 standard. 
       cookedInput = validNonStandard.matcher(cookedInput).replaceAll("%$1%$2"); 

       // Discard chopped encoded char at the end of the line 
       cookedInput = choppedNonStandard.matcher(cookedInput).replaceAll(""); 
      } 
     } 

     try { 
      return URLDecoder.decode(cookedInput,"UTF-16"); 
     } catch (UnsupportedEncodingException e) { 
      // Will never happen because the encoding is hardcoded 
      return null; 
     } 
    } 
}

來源

2014-05-02 12:40:20

回答

相關問題