2014-02-11 104 views
0

我想在逗號分隔字符串","。該字符串包含轉義逗號"\,"和轉義反斜槓"\\"。開頭和結尾的逗號以及連續的幾個逗號應該導致空字符串。以逗號分割字符串,但避免使用逗號和反斜槓

所以",,\,\\,,"應該成爲"""""\,\\"""""

請注意,我的例子顯示的字符串作爲反斜線單"\"。 Java字符串會讓它們加倍。

我嘗試了幾個軟件包,但沒有成功。我最後的想法是編寫我自己的解析器。

+2

爲什麼不使用OpenCSV? – fge

+0

這是[我的答案](http://stackoverflow.com/questions/21535811/regex-include-and-exclude-escape-sequences/21536109#21536109)從另一個有類似要求的問題。它在一行中處理多個'''''的情況。但是,正如fge所建議的那樣,使用庫可能會更好,因爲我的代碼是在不知道CSV格式角落案例的情況下編寫的。 – nhahtdh

+0

感謝您的建議。我會看看它。儘管如此,我希望我的項目儘可能少地依賴於其他工件(番石榴和Apache Commons可以)。可能這個問題是唯一需要這個庫的問題。所以我寧願不使用它。 –

回答

0

雖然肯定有專門的圖書館是一個好主意,下面的工作

public static String[] splitValues(final String input) { 
     final ArrayList<String> result = new ArrayList<String>(); 
     // (?:\\\\)* matches any number of \-pairs 
     // (?<!\\) ensures that the \-pairs aren't preceded by a single \ 
     final Pattern pattern = Pattern.compile("(?<!\\\\)(?:\\\\\\\\)*,"); 
     final Matcher matcher = pattern.matcher(input); 
     int previous = 0; 
     while (matcher.find()) { 
      result.add(input.substring(previous, matcher.end() - 1)); 
      previous = matcher.end(); 
     } 
     result.add(input.substring(previous, input.length())); 
     return result.toArray(new String[result.size()]); 
    } 

思想是找到,絕不或前綴偶數\(即沒有逃過,),併爲,是在之前end()-1的模式的最後部分被切斷。

函數經過測試,除了null -input之外,我可以想到的最可能性。如果你喜歡更好地處理List<String>,你當然可以改變回報;我只是採用split()中實施的模式來處理逃跑。

實例類uitilizing此功能:

import java.util.ArrayList; 
import java.util.regex.Matcher; 
import java.util.regex.Pattern; 

public class Print { 
    public static void main(final String[] args) { 
     String input = ",,\\,\\\\,,"; 
     final String[] strings = splitValues(input); 
     System.out.print("\""+input+"\" => "); 
     printQuoted(strings); 
    } 

    public static String[] splitValues(final String input) { 
     final ArrayList<String> result = new ArrayList<String>(); 
     // (?:\\\\)* matches any number of \-pairs 
     // (?<!\\) ensures that the \-pairs aren't preceded by a single \ 
     final Pattern pattern = Pattern.compile("(?<!\\\\)(?:\\\\\\\\)*,"); 
     final Matcher matcher = pattern.matcher(input); 
     int previous = 0; 
     while (matcher.find()) { 
      result.add(input.substring(previous, matcher.end() - 1)); 
      previous = matcher.end(); 
     } 
     result.add(input.substring(previous, input.length())); 
     return result.toArray(new String[result.size()]); 
    } 

    public static void printQuoted(final String[] strings) { 
     if (strings.length > 0) { 
      System.out.print("[\""); 
      System.out.print(strings[0]); 
      for(int i = 1; i < strings.length; i++) { 
       System.out.print("\", \""); 
       System.out.print(strings[i]); 
      } 
      System.out.println("\"]"); 
     } else { 
      System.out.println("[]"); 
     } 
    } 
} 
0

在這種情況下,自定義函數的聲音對我好。試試這個:

public String[] splitEscapedString(String s) { 
    //Character that won't appear in the string. 
    //If you are reading lines, '\n' should work fine since it will never appear. 
    String c = "\n"; 
    StringBuilder sb = new StringBuilder(); 
    for(int i = 0;i<s.length();++i){ 
     if(s.charAt(i)=='\\') { 
      //If the String is well formatted(all '\' are followed by a character), 
      //this line should not have problem. 
      sb.append(s.charAt(++i));     
     } 
     else { 
      if(s.charAt(i) == ',') { 
       sb.append(c); 
      } 
      else { 
       sb.append(s.charAt(i)); 
      } 
     } 
    } 
    return sb.toString().split(c); 
} 
0

不要使用.split(),但發現(轉義)逗號之間的所有比賽:

List<String> matchList = new ArrayList<String>(); 
Pattern regex = Pattern.compile(
    "(?:   # Start of group\n" + 
    " \\\\.  # Match either an escaped character\n" + 
    "|   # or\n" + 
    " [^\\\\,]++ # Match one or more characters except comma/backslash\n" + 
    ")*   # Do this any number of times", 
    Pattern.COMMENTS); 
Matcher regexMatcher = regex.matcher(subjectString); 
while (regexMatcher.find()) { 
    matchList.add(regexMatcher.group()); 
} 

結果:["", "", "\\,\\\\", "", ""]

我用了一個possessive quantifier++),以避免由於嵌套量詞造成的過度回溯。

相關問題