在C++中連續流幾個文件

我的問題與this類似，但是我還沒有找到任何有關此問題的C++參考。在C++中連續流幾個文件

有一個要讀取和處理的大文件列表。什麼是創建輸入流的最佳方式，它可以逐個從文件中獲取數據，在前一個文件結束時自動打開下一個文件？這個流將被賦予一個處理函數，該函數在文件邊界上順序讀取可變大小的塊。

2016-07-29 xivaxy

好了，「Unixy 「的方式是將程序編寫爲過濾器（即從stdin讀取並寫入stdout），然後使用現有的構建塊，如'cat input_file * .dat | myprogram'。但沒有更多的細節（即文件都在一個目錄中，名稱可以全局化，或者它們分佈在不同的地方，或者順序需要不同），很難說比這更多... – twalberg

你可以創建一個從std :: istream派生的新類，它包含'std :: ifstream'的std :: vector'，它可以自動切換到EOF上的下一個或讀取失敗 – KABoissonneault

將它們收集在緩衝區文件中，然後讀取他們之後？所以2部分操作 – Charlie

你想要做的是提供一個繼承自std::basic_streambuf的類型。有許多隱含的成員函數，其中相關的成員函數爲showmanyc(),underflow(),uflow()和xsgetn()。您需要將它們重載，在溢出時自動打開列表中的下一個文件（如果有的話）。

這是一個示例實現。我們作爲一個std::filebuf並只保留下一個文件的deque<string>，我們需要閱讀：

class multifilebuf : public std::filebuf 
{ 
public: 
    multifilebuf(std::initializer_list<std::string> filenames) 
    : next_filenames(filenames.begin() + 1, filenames.end()) 
    { 
     open(*filenames.begin(), std::ios::in); 
    } 

protected: 
    std::streambuf::int_type underflow() override 
    { 
     for (;;) { 
      auto res = std::filebuf::underflow(); 
      if (res == traits_type::eof()) { 
       // done with this file, move onto the next one 
       if (next_filenames.empty()) { 
        // super done 
        return res; 
       } 
       else { 
        // onto the next file 
        close(); 
        open(next_filenames.front(), std::ios::in); 

        next_filenames.pop_front(); 
        continue; 
       } 
      } 
      else { 
       return res; 
      } 
     } 
    } 

private: 
    std::deque<std::string> next_filenames; 
};

這樣一來，就可以讓一切透明的最終用戶：

multifilebuf mfb{"file1", "file2", "file3"}; 

std::istream is(&mfb); 
std::string word; 
while (is >> word) { 
    // transaparently read words from all the files 
}

來源

2016-07-29 18:04:11 Barry

這些事情將在接下來的問題中進行介紹，我將向那些聲稱瞭解有關C++的所有知識的人提問。很好找！ – KABoissonneault

@KABoissonneault即使繼續前進，並想出如何製作一個工作示例。我猜這種情況並不是那麼糟糕，只需要'underflow（）'。 – Barry

要獲得簡單的解決方案，請將boost的連接與istream迭代器的範圍用於文件。我不瞭解當前C++庫中的類似函數，但可能存在於TS Rangesv3中。

你也可以自己寫：自己寫連接是完全可能的。

我會把它寫成一個「扁平化」的僅用於輸入的迭代器 - 一個遍歷每個範圍內容的範圍內的迭代器。迭代器會跟蹤範圍的未來範圍，以及當前元素的迭代器。

Here是一個非常簡單的zip迭代器，可以讓您瞭解您必須編寫的代碼的大小（zip迭代器是一個不同的概念，這是一個簡單的代碼，只適用於for(:)循環）。

這是一個如何使用C++ 14做一個素描：

template<class It> 
struct range_t { 
    It b{}; 
    It e{}; 
    It begin() const { return b; } 
    It end() const { return e; } 
    bool empty() const { return begin()==end(); } 
}; 

template<class It> 
struct range_of_range_t { 
    std::deque<range_t<It>> ranges; 
    It cur; 
    friend bool operator==(range_of_range_t const& lhs, range_of_range_t const& rhs) { 
    return lhs.cur==rhs.cur; 
    } 
    friend bool operator!=(range_of_range_t const& lhs, range_of_range_t const& rhs) { 
    return !(lhs==rhs); 
    } 
    void operator++(){ 
    ++cur; 
    if (ranges.front().end() == cur) { 
     next_range(); 
    } 
    } 
    void next_range() { 
    while(ranges.size() > 1) { 
     ranges.pop_front(); 
     if (ranges.front().empty()) continue; 
     cur = ranges.front().begin(); 
     break; 
    } 
    } 
    decltype(auto) operator*() const { 
    return *cur; 
    } 
    range_of_range_t(std::deque<range_t<It>> in): 
    ranges(std::move(in)), 
    cur{} 
    { 
    // easy way to find the starting cur: 
    ranges.push_front({}); 
    next_range(); 
    } 
};

迭代器需要工作，它應該支持所有的迭代器公理。獲得最終迭代器是正確的。

這不是一個strema，而是一個迭代器。

來源

2016-07-29 17:58:21 Yakk

在C++中連續流幾個文件

回答

相關問題