比較兩個文件或爲什麼Java中的代碼比С++更快？

爲什麼Java中的代碼比С++更快？我需要逐字節比較2個文件。例如，當比較兩個文件大小650mb需要40秒到C++和10秒的Java。比較兩個文件或爲什麼Java中的代碼比С++更快？

C++代碼：

//bufferSize = 8mb 
std::ifstream lFile(lFilePath.c_str(), std::ios::in | std::ios::binary); 
std::ifstream rFile(rFilePath.c_str(), std::ios::in | std::ios::binary); 

std::streamsize lReadBytesCount = 0; 
std::streamsize rReadBytesCount = 0; 

do { 
    lFile.read(p_lBuffer, *bufferSize); 
    rFile.read(p_rBuffer, *bufferSize); 
    lReadBytesCount = lFile.gcount(); 
    rReadBytesCount = rFile.gcount(); 

    if (lReadBytesCount != rReadBytesCount || 
     std::memcmp(p_lBuffer, p_rBuffer, lReadBytesCount) != 0) 
    { 
     return false; 
    } 
} while (lFile.good() || rFile.good()); 

return true;

和Java代碼：

InputStream is1 = new BufferedInputStream(new FileInputStream(f1)); 
InputStream is2 = new BufferedInputStream(new FileInputStream(f2)); 

byte[] buffer1 = new byte[64]; 
byte[] buffer2 = new byte[64]; 

int readBytesCount1 = 0, readBytesCount2 = 0; 

while (
    (readBytesCount1 = is1.read(buffer1)) != -1 && 
    (readBytesCount2 = is2.read(buffer2)) != -1 
) {    
    if (Arrays.equals(buffer1, buffer2) && readBytesCount1 == readBytesCount2) 
     countItr++; 
    else { 
     result = false 
     break; 
    } 
}

來源

2013-03-02 Silnet

，如果你做的緩衝區大小相同，會發生什麼？ – Xymostech 2013-03-02 17:20:06

您是否將文件緩存作爲可能的影響因素？當在C++中緩存相同的代碼的工作時間更長時，則爲 – NPE 2013-03-02 17:20:38

。 – Silnet 2013-03-02 17:32:33

一個可能的答案可能是該C++代碼使用8兆字節的緩衝器，而Java版本使用64個字節。如果差異在前幾個字節內會發生什麼？那麼Java版本只需要讀取64個字節，以找到差異，而C++版本需要讀取800萬個字節。如果你想比較它們，你應該使用相同的緩衝區大小。

此外，如果文件相同，則可能會有其他原因導致差異。考慮分配8 MB數據所需的時間（甚至可能跨越多個頁面），而不是簡單地分配64個字節所需的時間。由於您正在順序讀取，所以開銷實際上在內存一側。

來源

2013-03-02 17:23:38

對兩個** **文件大小650 MB的文件進行壓縮需要40秒和10秒的Java時間。 – Silnet 2013-03-02 17:30:53

@Silnet如果它們是相同的，你在分配內存方面仍有巨大的開銷，可能內存甚至跨越多個頁面。 – 2013-03-02 17:48:07

這是一個虛假的說法，讀取8MB在當前硬件上不花時間，更不用說10或40秒。 – 2013-03-02 17:50:34

雖然緩衝區大小答案非常好，並且可能非常重要，但問題的另一個可能來源是使用iostream庫。我通常不會將這個庫用於這類工作。例如，這可能會導致一個問題，即額外複製，因爲iostream會爲您提供緩衝。我會使用原始的read和write調用。

例如，在Linux下C++ 11的平臺，我會做這樣的：

#include <array> 
#include <algorithm> 
#include <string> 
#include <stdexcept> 

// Needed for open and close on a Linux platform 
#include <sys/types.h> 
#include <sys/stat.h> 
#include <fcntl.h> 
#include <unistd.h> 

using ::std::string; 

bool same_contents(const string &fname1, const string &fname2) 
{ 
    int fd1 = ::open(fname1.c_str(), O_RDONLY); 
    if (fd1 < 0) { 
     throw ::std::runtime_error("Open of " + fname1 + " failed."); 
    } 
    int fd2 = ::open(fname2.c_str(), O_RDONLY); 
    if (fd2 < 0) { 
     ::close(fd1); 
     fd1 = -1; 
     throw ::std::runtime_error("Open of " + fname2 + " failed."); 
    } 

    bool same = true; 
    try { 
     ::std::array<char, 4096> buf1; 
     ::std::array<char, 4096> buf2; 
     bool done = false; 

     while (!done) { 
     int read1 = ::read(fd1, buf1.data(), buf1.size()); 
     if (read1 < 0) { 
      throw ::std::runtime_error("Error reading " + fname1); 
     } 
     int read2 = ::read(fd2, buf2.data(), buf2.size()); 
     if (read2 < 0) { 
      throw ::std::runtime_error("Error reading " + fname2); 
     } 
     if (read1 != read2) { 
      same = false; 
      done = true; 
     } 
     if (same && read1 > 0) { 
      const auto compare_result = ::std::mismatch(buf1.begin(), 
                 buf1.begin() + read1, 
                 buf2.begin()); 
      if (compare_result.first != (buf1.begin() + read1)) { 
       same = false; 
      } 
     } 
     if (!same || (buf1.size() > read1)) { 
      done = true; 
     } 
     } 
    } catch (...) { 
     if (fd1 >= 0) ::close(fd1); 
     if (fd2 >= 0) ::close(fd2); 
     throw; 
    } 
    if (fd1 >= 0) ::close(fd1); 
    if (fd2 >= 0) ::close(fd2); 
    return same; 
}

來源

2013-03-02 17:30:19 Omnifarious

我只是把你的Java程序，並寫了一個等價的C++程序，並都採取幾乎相同比較兩個完全相同的文件，給或採取第二個。

一個可能的，微不足道的解釋是，您先運行C++程序，然後運行Java程序。如果這是您唯一的測試，那麼執行時間的差異可以通過緩存來解釋，儘管在今天的硬件上讀取650 MB的時間有40秒。

數據塊位於系統文件緩存中，第二次沒有磁盤訪問來檢索文件。爲了獲得可比較的結果，請使用C++和Java程序多次運行測試。

在代碼中，你有

lFile.read(p_lBuffer, *bufferSize);

其開頭

//bufferSize = 8mb

所以，除非你表現出真正完整代碼，任何人的猜測是有效的矛盾的評論。

吃我自己的狗食

#include <iostream> 
#include <fstream> 
#include <cstring> 

const size_t N = 8 * 1024 * 1024; 
char buf1[N], buf2[N]; 

int main(int argc, char **argv) 
{ 
    std::iostream::sync_with_stdio(false); 
    std::ifstream f1(argv[1]); 
    std::ifstream f2(argv[2]); 
    while (f1.read(buf1, sizeof(buf1)) && f2.read(buf2, sizeof(buf2))) { 
     size_t n1 = f1.gcount(), n2 = f2.gcount(); 
     if (n1 != n2 || memcmp(buf1, buf2, n1) != 0) 
      return 1; 
    } 

    return 0; 
}

來源

2013-03-02 17:45:40

比較兩個文件或爲什麼Java中的代碼比С++更快？

回答

相關問題