2015-08-03 27 views
3

可以說我有這樣一個結構的文件:Java的順序解析從文件信息

線0:

354858Some String That Is ImportantAA其他的東西SOMESTUFF 應BE IGNORED

第1行:

543788Another String That Is ImportantAA其他的東西 SOMESTUFF需要忽略

等等...

現在我想獲得那就是信息在我的示例中標記(請參閱灰色背景)。序列AA始終存在(並可用作中斷並跳到下一行),而信息字符串的長度不同。

什麼是解析信息的最佳方式?與if, then, else或緩衝的讀者是有某種解析器,你可以告訴的,讀一些lenth XYZ然後閱讀一切爲String的,直到你找到AA然後跳過線

+3

你想要什麼叫[正則表達式](https://en.wikipedia.org/wiki/Regular_expression)。 – m0skit0

+0

這就是我一直在尋找的,謝謝! – Flatron

+0

確定「AA」不會出現在「某些重要的字符串」中嗎? –

回答

1

我會逐行閱讀文件,並將每行與正則表達式進行匹配。我希望我在下面的代碼中的評論足夠詳細。

// The pattern to use 
Pattern p = Pattern.compile("^([0-9]+)\\s+(([^A]|A[^A])+)AA"); 

// Read file line by line 
BufferedReader br = new BufferedReader(new FileReader(myFile)); 
String line; 
while((line = br.readLine()) != null) { 
    // Match line against our pattern 
    Matcher m = p.matcher(line); 
    if(m.find()) { 
    // Line is valid, process it however you want 
    // m.group(1) contains the number 
    // m.group(2) contains the text between number and AA 
    } else { 
    // Line has invalid format (pattern does not match) 
    } 
} 

正則表達式(pattern)的說明我用:

^([0-9]+)\s+(([^A]|A[^A])+)AA 

^    matches the start of the line 
([0-9]+)  matches any integral number 
\s+    matches one or more whitespace characters 
(([^A]|A[^A])+) matches any characters which are either not A or not followed by another A 
AA    matches the terminating AA 

更新作爲回覆評論:

如果每行有一個前|性格,表達外觀像這樣:

^\|([0-9]+)\s+(([^A]|A[^A])+)AA 

在Java中,你需要逃避這樣的:

"^\\|([0-9]+)\\s+(([^A]|A[^A])+)AA" 

字符|在正則表達式特殊含義,來轉義。

+0

謝謝你這個例子,我現在需要查看正則表達式。 – Flatron

+1

@Flatron不客氣,我更新了我的答案並添加了對該表達的解釋。 –

+0

我有一個問題,我真的不想複製和粘貼解決方案,但對於學習和測試它有幫助。當我複製你的代碼時,我得到一個錯誤'「無效的轉義序列(有效的轉義序列是\ b \ t \ n \ f \ r \」\'\\)「''模式'」^([0-9] + )\ s +(([^ A] | A [^ A])+)AA「'我錯過了什麼嗎?我importet'java.util.regex.Pattern;'但這沒有幫助。在AA背後有什麼遺漏? – Flatron

1

要告訴你哪個是最適合你的問題是不可能的,沒有更多的信息。

一個解決方案可能

String s = "354858 Some String That Is Important AA OTHER STUFF SOMESTUFF THAT SHOULD BE IGNORED"; 
String[] split = s.substring(0, s.indexOf(" AA")).split(" ", 2); 
System.out.println("split = " + Arrays.toString(split)); 

輸出

split = [354858, Some String That Is Important] 
0

這裏是您的解決方案:

public static void main(String[] args) { 
    InputStream source; //select a text source (should be a FileInputStream) 
    { 
     String fileContent = "354858 Some String That Is Important AA OTHER STUFF SOMESTUFF THAT SHOULD BE IGNORED\n" + 
       "543788 Another String That Is Important AA OTHER STUFF SOMESTUFF THAT SHOULD BE IGNORED"; 
     source = new ByteArrayInputStream(fileContent.getBytes(StandardCharsets.UTF_8)); 
    } 

    try(BufferedReader stream = new BufferedReader(new InputStreamReader(source))) { 
     Pattern pattern = Pattern.compile("^([0-9]+) (.*?) AA .*$"); 
     while(true) { 
      String line = stream.readLine(); 
      if(line == null) { 
       break; 
      } 
      Matcher matcher = pattern.matcher(line); 
      if(matcher.matches()) { 
       String someNumber = matcher.group(1); 
       String someText = matcher.group(2); 
       //do something with someNumber and someText 
      } else { 
       throw new ParseException(line, 0); 
      } 
     } 
    } catch (IOException | ParseException e) { 
     e.printStackTrace(); // TODO ... 
    } 
} 
0

你可以使用正則表達式,但如果你知道每一行包含AA和你想要的內容,以AA你可以簡單地做substring(int,int),以獲得該行的部分達到AA

public List read(Path path) throws IOException { 
    return Files.lines(path) 
      .map(this::parseLine) 
      .collect(Collectors.toList()); 
} 

public String parseLine(String line){ 
    int index = line.indexOf("AA"); 
    return line.substring(0,index); 
} 

這裏是read

public List read(Path path) throws IOException { 
    List<String> content = new ArrayList<>(); 

    try(BufferedReader reader = new BufferedReader(new FileReader(path.toFile()))){ 
     String line; 
     while((line = reader.readLine()) != null){ 
      content.add(parseLine(line)); 
     } 
    } 

    return content; 
} 
1

非Java8版本,您可以逐行讀取文件中的行,並排除其中包含AAcharSequence部分:

final String charSequence = "AA"; 
String line; 
BufferedReader r = new BufferedReader(new InputStreamReader(new FileInputStream("yourfilename"))); 
try { 
    while ((line = r.readLine()) != null) { 
     int pos = line.indexOf(charSequence); 
     if (pos > 0) { 
      String myImportantStuff = line.substring(0, pos); 
      //do something with your useful string 
     } 
    } 
} finally { 
    r.close(); 
}