Java文本文件比較 - 過量和丟失

我有一個要求比較2個文本文件（MasterCopy.txt和ClientCopy.txt)。我想獲取ClientCopy.txt中缺少的字符串列表，還需要獲取字符串列表這是多餘的。Java文本文件比較 - 過量和丟失

MasterCopy.txt

倫敦
巴黎的內容
羅馬

內容ClientCopy.txt

倫敦
柏林
羅馬
阿姆斯特丹

我想獲得這些結果

缺少：

巴黎

過量：

柏林
阿姆斯特丹

來源

2015-12-22 Zoro

多大是你的文件？你已經嘗試了什麼？ –

這個問題目前的形式並不適用於Stack Overflow。 SO主要是爲了幫助調試現有的代碼。這個問題是要求SO用戶有效地爲您編寫代碼。你試過什麼了？什麼還沒有工作呢？發佈一些你的代碼，有人可以幫助你調試和修復它。 – eestrada

兩個浮現在腦海的想法所得到的兩個文件的差異：

https://code.google.com/p/java-diff-utils/

從他們的維基

任務1：計算到文件之間的差異，並打印了三角洲解決方案：

import difflib.*; 
public class BasicJavaApp_Task1 { 
    // Helper method for get the file content 
    private static List<String> fileToLines(String filename) { 
    List<String> lines = new LinkedList<String>(); 
    String line = ""; 
    try { 
     BufferedReader in = new BufferedReader(new FileReader(filename)); 
     while ((line = in.readLine()) != null) { 
     lines.add(line); 
     } 
    } catch (IOException e) { 
     e.printStackTrace(); 
    } 
    return lines; 
    } 

    public static void main(String[] args) { 
    List<String> original = fileToLines("originalFile.txt"); 
    List<String> revised = fileToLines("revisedFile.xt"); 

    // Compute diff. Get the Patch object. Patch is the container for computed deltas. 
    Patch patch = DiffUtils.diff(original, revised); 

    for (Delta delta: patch.getDeltas()) { 
     System.out.println(delta); 
    } 
    } 
}

或使用HashSet的：

http://docs.oracle.com/javase/7/docs/api/java/util/HashSet.html

修飾的HashSet的@尼克的回答用：

Scanner s = new Scanner(new File(「MasterCopy.txt」)); 
HashSet<String> masterlist = new HashSet<String>(); 
while (s.hasNext()){ 
    masterlist.put(s.next()); 
} 
s.close(); 
s = new Scanner(new File(「ClientCopy.txt」)); 
HashSet<String> clientlist = new HashSet<String>(); 
while (s.hasNext()){ 
    clientlist.put(s.next()); 
} 
s.close(); 

//Do the comparison 
ArrayList<String> missing = new ArrayList<String>(); 
ArrayList<String> excess = new ArrayList<String>(); 
//Check for missing or excess 
for(String line : masterlist){ 
    if(clientlist.get(line) == null) missing.add(line); 
} 
for(String line : clientlist){ 
    if(masterlist.get(line) == null) excess.add(line); 
}

來源

2015-12-22 19:57:31 Nielsvh

如果執行時間是不是你能做到這一點的一個重要因素，假設你只比較各行：

//Get the files into lists 
Scanner s = new Scanner(new File(「MasterCopy.txt」)); 
HashSet<String> masterlist = new HashSet<String>(); 
while (s.hasNext()){ 
    masterlist.add(s.next()); 
} 
s.close(); 
s = new Scanner(new File(「ClientCopy.txt」)); 
HashSet<String> clientlist = new HashSet<String>(); 
while (s.hasNext()){ 
    clientlist.add(s.next()); 
} 
s.close(); 

//Do the comparison 
HashSet<String> missing = new HashSet<String>(); 
HashSet<String> excess = new HashSet<String>(); 
//Check for missing or excess 
for(String s : masterlist){ 
    if(!clientlist.contains(s)) missing.add(s); 
} 
for(String s : clientlist){ 
    if(!masterlist.contains(s)) excess.add(s); 
}

來源

2015-12-22 20:12:38

我在想，HashSet會是最有利的，因爲一個字符串的查找時間是O（1），因爲數組的內存大小是基於哈希函數和元素數量，而不是被存儲的字符串的長度。 – Nielsvh

編輯爲使用HashSets。 –

謝謝它運作良好！ – Zoro

Java文本文件比較 - 過量和丟失

回答

相關問題