我想在逗號分隔字符串","
。該字符串包含轉義逗號"\,"
和轉義反斜槓"\\"
。開頭和結尾的逗號以及連續的幾個逗號應該導致空字符串。以逗號分割字符串,但避免使用逗號和反斜槓
所以",,\,\\,,"
應該成爲""
,""
,"\,\\"
,""
,""
請注意,我的例子顯示的字符串作爲反斜線單"\"
。 Java字符串會讓它們加倍。
我嘗試了幾個軟件包,但沒有成功。我最後的想法是編寫我自己的解析器。
我想在逗號分隔字符串","
。該字符串包含轉義逗號"\,"
和轉義反斜槓"\\"
。開頭和結尾的逗號以及連續的幾個逗號應該導致空字符串。以逗號分割字符串,但避免使用逗號和反斜槓
所以",,\,\\,,"
應該成爲""
,""
,"\,\\"
,""
,""
請注意,我的例子顯示的字符串作爲反斜線單"\"
。 Java字符串會讓它們加倍。
我嘗試了幾個軟件包,但沒有成功。我最後的想法是編寫我自己的解析器。
雖然肯定有專門的圖書館是一個好主意,下面的工作
public static String[] splitValues(final String input) {
final ArrayList<String> result = new ArrayList<String>();
// (?:\\\\)* matches any number of \-pairs
// (?<!\\) ensures that the \-pairs aren't preceded by a single \
final Pattern pattern = Pattern.compile("(?<!\\\\)(?:\\\\\\\\)*,");
final Matcher matcher = pattern.matcher(input);
int previous = 0;
while (matcher.find()) {
result.add(input.substring(previous, matcher.end() - 1));
previous = matcher.end();
}
result.add(input.substring(previous, input.length()));
return result.toArray(new String[result.size()]);
}
思想是找到,
絕不或前綴偶數\
(即沒有逃過,
),併爲,
是在之前end()-1
的模式的最後部分被切斷。
函數經過測試,除了null
-input之外,我可以想到的最可能性。如果你喜歡更好地處理List<String>
,你當然可以改變回報;我只是採用split()
中實施的模式來處理逃跑。
實例類uitilizing此功能:
import java.util.ArrayList;
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class Print {
public static void main(final String[] args) {
String input = ",,\\,\\\\,,";
final String[] strings = splitValues(input);
System.out.print("\""+input+"\" => ");
printQuoted(strings);
}
public static String[] splitValues(final String input) {
final ArrayList<String> result = new ArrayList<String>();
// (?:\\\\)* matches any number of \-pairs
// (?<!\\) ensures that the \-pairs aren't preceded by a single \
final Pattern pattern = Pattern.compile("(?<!\\\\)(?:\\\\\\\\)*,");
final Matcher matcher = pattern.matcher(input);
int previous = 0;
while (matcher.find()) {
result.add(input.substring(previous, matcher.end() - 1));
previous = matcher.end();
}
result.add(input.substring(previous, input.length()));
return result.toArray(new String[result.size()]);
}
public static void printQuoted(final String[] strings) {
if (strings.length > 0) {
System.out.print("[\"");
System.out.print(strings[0]);
for(int i = 1; i < strings.length; i++) {
System.out.print("\", \"");
System.out.print(strings[i]);
}
System.out.println("\"]");
} else {
System.out.println("[]");
}
}
}
在這種情況下,自定義函數的聲音對我好。試試這個:
public String[] splitEscapedString(String s) {
//Character that won't appear in the string.
//If you are reading lines, '\n' should work fine since it will never appear.
String c = "\n";
StringBuilder sb = new StringBuilder();
for(int i = 0;i<s.length();++i){
if(s.charAt(i)=='\\') {
//If the String is well formatted(all '\' are followed by a character),
//this line should not have problem.
sb.append(s.charAt(++i));
}
else {
if(s.charAt(i) == ',') {
sb.append(c);
}
else {
sb.append(s.charAt(i));
}
}
}
return sb.toString().split(c);
}
不要使用.split()
,但發現(轉義)逗號之間的所有比賽:
List<String> matchList = new ArrayList<String>();
Pattern regex = Pattern.compile(
"(?: # Start of group\n" +
" \\\\. # Match either an escaped character\n" +
"| # or\n" +
" [^\\\\,]++ # Match one or more characters except comma/backslash\n" +
")* # Do this any number of times",
Pattern.COMMENTS);
Matcher regexMatcher = regex.matcher(subjectString);
while (regexMatcher.find()) {
matchList.add(regexMatcher.group());
}
結果:["", "", "\\,\\\\", "", ""]
我用了一個possessive quantifier(++
),以避免由於嵌套量詞造成的過度回溯。
爲什麼不使用OpenCSV? – fge
這是[我的答案](http://stackoverflow.com/questions/21535811/regex-include-and-exclude-escape-sequences/21536109#21536109)從另一個有類似要求的問題。它在一行中處理多個'''''的情況。但是,正如fge所建議的那樣,使用庫可能會更好,因爲我的代碼是在不知道CSV格式角落案例的情況下編寫的。 – nhahtdh
感謝您的建議。我會看看它。儘管如此,我希望我的項目儘可能少地依賴於其他工件(番石榴和Apache Commons可以)。可能這個問題是唯一需要這個庫的問題。所以我寧願不使用它。 –