跳過一個字符串的前幾個字

我想計數它具有以下格式的文本文件中的單詞量：跳過一個字符串的前幾個字

TITEL####URL####ABSTRACT\n 
TITEL####URL####ABSTRACT\n 
TITEL####URL####ABSTRACT\n

這樣的：

Available line####http://en.wikipedia.org/wiki/Available_line####In voice, 
Marwan al-Shehhi####http://en.wikipedia.org/wiki/Marwan_al-Shehhi####Marwan etc. 
Theodore Beza####http://en.wikipedia.org/wiki/Theodore_Beza####Theodore Beza etc.

我的代碼來算的話是這樣的：

public static int countTotalWords() { 
    totalWords = 0; 

    try { 
     FileInputStream fis; 
     fis = new FileInputStream(fileName); 


     Scanner scan = new Scanner(fis); 

     while (scan.hasNext()) { 
      totalWords++; 
      scan.next(); 
     } 
    } catch (FileNotFoundException ex) { 
     Logger.getLogger(Opgave1.class.getName()).log(Level.SEVERE, null, ex); 
    } 
    return totalWords; 
}

我假設它的作品...

我想只計算摘要中的單詞，因此忽略標題和URL。我猜測####可以用來跳過每一行的第一部分，但對於我來說，我無法弄清楚如何。任何幫助表示讚賞！

來源

2013-05-30 GeorgeWChubby

您可以分割字符串：

String s = "TITEL####URL####ABSTRACT\n"; 
String[] tokens = s.split("#+"); 
String abstractText = tokens[2];

然後再以數的話，你可以進一步分裂：

int count = abstractText.split("\\s+").length;

注：如果您使用Java 7+和你的文件是不是太大了，你還可以閱讀：

List<String> lines = Files.readAllLines(file, charset);

來源

2013-05-30 23:12:14 assylias

您可以使用lastIndexOf找到最後####。

因此給定一條線可以跳過前兩個參數。

你有沒有試過你的代碼？我不熟悉Scanner（我會假定它允許逐行消耗），但它看起來像只是數線。

來源

2013-05-30 23:12:33 Guvante

假定已固定的4個哈希分隔的字符串，可以使用這個代碼計數沒有言語的：

public static int countTotalWords() { 
     totalWords = 0; 

     try { 
      FileInputStream fis; 
      fis = new FileInputStream(fileName); 


      Scanner scan = new Scanner(fis); 

      while (scan.hasNext()) { 
       String wordsString = scan.next().substring(str.lastIndexOf("####") + 4, str.length()); 
       String[] wordsArr = wordsString.split(" "); 
       int noOfWords = wordsArr.length; 
       totalWords = totalWords + noOfWords; 

      } 
     } catch (FileNotFoundException ex) { 
      Logger.getLogger(Opgave1.class.getName()).log(Level.SEVERE, null, ex); 
     } 
     return totalWords; 
    }

來源

2013-05-30 23:18:27

跳過一個字符串的前幾個字

回答

相關問題