Java模式/匹配器

這是一個示例文本：\1f\1e\1d\020028。我無法修改輸入文本，我正在從文件中讀取長串文本。Java模式/匹配器

我想提取以下內容：\1f，\1e，\1d，\02

對於這一點，我寫了下面的正則表達式模式："\\[a-fA-F0-9]"

我使用Pattern和Matcher類，但我的匹配器無法使用所提到的正則表達式來查找模式。我已經在一些在線正則表達式網站上測試了這個正則表達式，並且令人驚訝的是它在那裏工作。

我哪裏錯了？

原始代碼：

public static void main(String[] args) { 
    String inputText = "\1f\1e\1d\02002868BF03030000000000000000S023\1f\1e\1d\03\0d"; 
    inputText  = inputText.replace("\\", "\\\\"); 

    String regex  = "\\\\[a-fA-F0-9]{2}"; 

    Pattern p = Pattern.compile(regex); 
    Matcher m = p.matcher(inputText); 

    while (m.find()) { 
     System.out.println(m.group()); 
    } 
}

輸出：什麼也沒有打印

來源

2014-11-05 bullzeye

我猜你的一些反斜槓正在逃避你不想要的東西。不過，你必須向我們展示你的實際代碼。 – azurefrog 2014-11-05 22:03:12

'\\ [a-fA-F0-9]'尋找反斜槓後跟一個字母或數字。我想你想尋找反斜槓後面跟兩個字母或數字。我懷疑你可以弄清楚如何解決這個問題。 – ajb 2014-11-05 22:03:50

您是否正確輸入字符串的格式？應該是'\\ 1f \\ 1e \\ 1d \\ 020028'我認爲。 – kaos 2014-11-05 22:04:13

嘗試添加。在最後，如：

\\[a-fA-F0-9].

來源

2014-11-05 22:02:34

您需要正確讀取文件並用'\\'替換'\'字符。假設有一個名爲test_file裏面在您對此內容的項目文件：

\1f\1e\1d\02002868BF03030000000000000000S023\1f\1e\1d\03\0d

這裏是讀取文件並提取值的代碼：

public static void main(String[] args) throws IOException, URISyntaxException {   
    Test t = new Test(); 
    t.test(); 
} 

public void test() throws IOException {   
    BufferedReader br = 
     new BufferedReader(
      new InputStreamReader(
       getClass().getResourceAsStream("/test_file.txt"), "UTF-8")); 
    String inputText; 

    while ((inputText = br.readLine()) != null) { 
     inputText = inputText.replace("\\", "\\\\"); 

     Pattern pattern = Pattern.compile("\\\\[a-fA-F0-9]{2}"); 
     Matcher match = pattern.matcher(inputText); 

     while (match.find()) { 
      System.out.println(match.group()); 
     } 
    } 
}

來源

2014-11-05 22:05:23 kaos

你的代碼確實有效。但是，當我做了類似的事情，就像你上面看到的那樣，它不起作用。 – bullzeye 2014-11-06 07:06:40

問題是轉義輸入字符串。檢查更新。我從apache commons lang使用了StringEscapeUtils。 – kaos 2014-11-06 10:11:02

@bullzeye解釋'escapeJava'將返回Unicode表示而不是八進制表示，所以代替'\ 1'或'\ 0'，您將得到'\ u0001'或'\ u0000'，這就是爲什麼'replace（「\\ u000 「，」\\「）'（需要將'\ u0001'轉換爲'\ 1'，就像你的字符串一樣）。 – Pshemo 2014-11-06 14:42:07

_{（答案改OP增加了更多的細節後）}

你的字符串

String inputText = "\1f\1e\1d\02002868BF03030000000000000000S023\1f\1e\1d\03\0d";

實際上不包含任何\文字，因爲根據第3.10.6. Escape Sequences for Character and String Literals段中的Java語言規範，\xxx將被解釋爲Unicode表中的字符索引，其中octal (base/radix 8)值由xxx部分表示。

例\123 = * 8 + * 8 + * 8 = 1 * 64 + 2 * 8 + 3 * 1 = 64 + 16 + 3 = 代表character S

如果你在你的問題給出的字串被寫入正好在文本文件中的相同，那麼你應該把它寫成

String inputText = "\\1f\\1e\\1d\\02002868BF03030000000000000000S023\\1f\\1e\\1d\\03\\0d";

（逃脫\現在將代表文字）。

（舊版本我的回答）

這是很難說究竟你沒有看到你的代碼做錯了。你應該至少可以找到\1,\1,\1,\0，因爲你的正則表達式可以匹配一個\和放在它後面的一個十六進制字符。

反正這是你可以找到你的問題中提到的結果：

String text = "\\1f\\1e\\1d\\020028"; 
Pattern p = Pattern.compile("\\\\[a-fA-F0-9]{2}"); 
//           ^^^--we want to find two hexadecimal 
//            characters after \ 
Matcher m = p.matcher(text); 
while (m.find()) 
    System.out.println(m.group());

輸出：

\1f 
\1e 
\1d 
\02

來源

2014-11-05 22:06:07 Pshemo

你提到的代碼有效。但是，當我做了類似的事情，就像你上面看到的那樣，它不起作用。 – bullzeye 2014-11-06 07:17:27

@bullzeye檢查我更新的答案。 – Pshemo 2014-11-06 14:42:59

如果你不想修改輸入字符串，你可以嘗試像：

static public void main(String[] argv) { 

      String s = "\1f\1e\1d\020028"; 
      Pattern regex = Pattern.compile("[\\x00-\\x1f][0-9A-Fa-f]"); 
      Matcher match = regex.matcher(s); 
      while (match.find()) { 
        char[] c = match.group().toCharArray(); 
        System.out.println(String.format("\\%d%s",c[0]+0, c[1])) ; 
      } 
    }

是的，這並不完美，但你明白了。

來源

2014-11-05 22:55:05 belwood

謝謝！該解決方案部分工作。對於我在修改的答覆中提到，下面的輸入字符串是輸出： '\ 1F \ 1E \ 1D \ 160 \ 1F \ 1E \ 1D \ 0D' – bullzeye 2014-11-06 07:14:28

Java模式/匹配器

回答

相關問題