String.contains函數不起作用

我必須檢查File1中的單詞是否存在於File2中，然後進行計數。這兩個文件中的數據如下所示。String.contains函數不起作用

File1中的字被如下所示：

發表
發愁
發達
發抖
發揮

在文件2的數據被如下所示：

這篇論文是什麼時候發表的？
91。數據刪掉被馬工程師了
92。駕駛酒後很大危害
93。客觀地要他人評價
94 。我不小心水壺打翻了把

我寫的代碼如下：

File file1 = new File("ChineseWord.txt"); 
     Scanner sc = new Scanner(new FileInputStream(file1)); 
     ArrayList<String> list = new ArrayList<String>(); 
     ArrayList<String> newList = new ArrayList<String>(); 

     while(sc.hasNext()){ 
       list.add(sc.next()); 
     } 

     sc.close(); 

     File file2 = new File("RandomData.txt"); 

     Scanner newScanner = new Scanner(new FileInputStream(file2)); 

     int count = 0; 

     for (int i = 0; i < list.size(); i++) { 

      while(newScanner.hasNext()){ 

       String word = newScanner.nextLine(); 
       String toMatch = list.get(i); 

       if(word.contains(toMatch)){ 
        System.out.println("Success"); 
        count++; 
       } 


      } 

      String test = list.get(i); 
      newList.add(test+"exists" + count+ "times"); 
      count =0; 

     }

問題是它對所有單詞都返回0，而File1中的第一個單詞存在於File2的第一行。如果我手動做這樣的事情

if(word.contains("發表")){ 
         System.out.println("Success"); 
         count++; 
        }

它打印成功，否則它不會？這是爲什麼？

來源

2016-04-26 indexOutOfBounds

見http://stackoverflow.com/questions/22048692/check-if-string-contains-cjk-chinese-characters and http://stackoverflow.com/questions/26357938/detect-chinese-character-in-java – Adi

我會確保字符編碼讀取是您寫的。您可以嘗試使用UTF-8或UTF-16LE，但必須保持一致。 –

字符編碼是UTF-8 – indexOutOfBounds

的問題是你的邏輯中，因爲你循環遍歷每個list話，但你的「文件2」的掃描儀只能創建一次這list -loop之外。

您可能應該將列表循環移過if (word.contains(toMatch))。

按照你的意見，我做了一個快速測試用：

package so36862093; 

import com.google.common.io.Resources; 

import java.io.File; 
import java.io.FileInputStream; 
import java.nio.file.Files; 
import java.util.*; 

public class App { 
    public static void main(final String[] args) throws Exception { 
     final File file1 = new File(Resources.getResource("so36862093/ChineseWord.txt").toURI()); 
     final List<String> list = Files.readAllLines(file1.toPath()); 
     final File file2 = new File(Resources.getResource("so36862093/RandomData.txt").toURI()); 
     final Scanner newScanner = new Scanner(new FileInputStream(file2)); 
     final Map<String, Integer> count = new HashMap<>(); 

     while(newScanner.hasNext()){ 
      final String word = newScanner.nextLine(); 

      for (String toMatch : list) { 
       if(word.contains(toMatch)){ 
        System.out.println("Success"); 
        count.put(toMatch, count.getOrDefault(toMatch, 0) + 1); 
       } 
      } 
     } 

     for (Map.Entry<String, Integer> e : count.entrySet()) { 
      System.out.println(e.getKey() + " exists " + e.getValue() + " times."); 
     } 
    } 
}

和ChineseText.txt（UTF-8）

發表 
發愁 
發達 
發抖 
發揮

和RandomData.txt（UTF-8）：

輸出是

後續：我打一點與您共享的項目，問題是，你必須在每行的開始非打破空間U+65279（我做不）。

插圖：

所以，你應該"strip"那之前別的字符。

來源

2016-04-26 10:26:13

是的，我明白這一點。 for循環在while循環中，我一直在做很多事情，所以這就是爲什麼我發佈我的最後一個代碼，它有點搞砸了。問題是，如果你嘗試這樣做，它不會讓第一個詞成功。爲什麼？ – indexOutOfBounds

固定代碼適用於我，您應該仔細檢查您的代碼和輸入。 – 2016-04-26 11:23:10

我剛剛複製了你的代碼。檢查了所有的文本文件編碼，但它不適合我？它有什麼不對？ – indexOutOfBounds

現在你正在讀取整個文件，然後將它與列表中的第一個元素進行比較，它應該是相反的方向，從file2讀取第一行並將其與整個列表進行比較。

更改您的代碼 - >

while(newScanner.hasNext()){ 
    String word = newScanner.nextLine(); 
    for (int i = 0; i < list.size(); i++) { 
     String toMatch = list.get(i); 

     if(word.contains(toMatch)){ 
      System.out.println("Success"); 
      count++; 
     } 
    } 
}

來源

2016-04-26 10:30:15

我完全明白這一點。我刪除了文件1中除第一個以外的所有單詞。if（word.contains（「發表」））{0} {0} \t count ++; \t}正在工作，而不是其他方式？ – indexOutOfBounds

我覺得你的問題是在編碼：

Scanner newScanner = new Scanner(new FileInputStream(file2),"UNICODE");

試一下：

File file1 = new File("data/ChineseWord.txt"); 
    Scanner sc = new Scanner(new FileInputStream(file1),"UNICODE"); 
    ArrayList<String> list = new ArrayList<String>(); 
    ArrayList<String> newList = new ArrayList<String>(); 

    while(sc.hasNext()){ 
      list.add(sc.next()); 
    } 

    sc.close(); 

    File file2 = new File("data/RandomData.txt"); 
    Scanner newScanner = new Scanner(new FileInputStream(file2),"UNICODE"); 

    int count = 0; 

    for (int i = 0; i < list.size(); i++) { 

     while(newScanner.hasNext()){ 

      String word = newScanner.nextLine(); 
      String toMatch = list.get(i); 

      if(word.contains(toMatch)){ 
       System.out.println("Success"); 
       count++; 
      } 


     } 

     String test = list.get(i); 
     newList.add(test+"exists" + count+ "times"); 
     count =0; 

    }

來源

2016-04-26 10:45:09

不，它不起作用。控制檯中的輸出已被更改爲一些奇怪的字符。 – indexOutOfBounds

String.contains函數不起作用

回答

相關問題