2016-07-26 81 views
1

我需要來標記,其中記號被定義的文本文件「[A-ZA-Z] +」 以下工作:使用掃描儀來標記文件

Pattern WORD = Pattern.compile("[a-zA-Z]+"); 

File f = new File(...); 
FileInputStream inputStream = new FileInputStream(f); 
Scanner scanner = new Scanner(inputStream); e problem is 

String word = null; 

while((word = scanner.findWithinHorizon(WORD, (int)f.length())) != null) { 
    // process the word 
} 

的問題是,findWithinHorizon需要int作爲地平線,而 文件長度的類型爲long

什麼是合理的方式使用掃描儀標記大文件?

回答

3

使用分隔符是匹配模式的否定:

Scanner s = new Scanner(f).useDelimiter("[^a-zA-Z]+"); 
while(s.hasNext()) { 
    String token = s.next(); 
    // do something with "token" 
}