2013-10-04 74 views
9

我有一個文件,我想在Java中讀取並將此文件拆分爲n(用戶輸入)輸出文件。這裏是我讀文件:Java - 讀取文件並拆分爲多個文件

int n = 4; 
BufferedReader br = new BufferedReader(new FileReader("file.csv")); 
try { 
    String line = br.readLine(); 

    while (line != null) { 
     line = br.readLine(); 
    } 
} finally { 
    br.close(); 
} 

我如何將文件分割 - file.csvn文件?

注 - 由於文件中的條目數量爲100k,因此我無法將文件內容存儲到數組中,然後將其拆分並保存到多個文件中。

+0

在while循環中,只要收集儘可能多的行到String或StringBuilder中並將它們寫入單獨的文件。事先不知道文件的數量,最好是在文件中定義最大行數。 –

+0

您可能需要循環兩次,一次獲取行數並分開一次。或者你可以猜測線路的數量和分割方式。 –

+0

@ kw4nta爲什麼要在地球上保存線條。 1)OP說存儲所有行不是一種選擇,2)假設你可以將行直接寫入另一個文件... –

回答

11

既然文件可能會非常大,分割文件本身可能藏漢大:

例子:

源文件大小:5GB

民拆分:5:目標

文件大小:每個1GB(5個文件)

即使我們有這樣的內存,也無法一次讀取這個大的分割塊。基本上對於每個拆分,我們可以讀取修復大小byte-array,我們知道在性能和內存方面應該是可行的。

NumSplits:10個MaxReadBytes:8KB

public static void main(String[] args) throws Exception 
    { 
     RandomAccessFile raf = new RandomAccessFile("test.csv", "r"); 
     long numSplits = 10; //from user input, extract it from args 
     long sourceSize = raf.length(); 
     long bytesPerSplit = sourceSize/numSplits ; 
     long remainingBytes = sourceSize % numSplits; 

     int maxReadBufferSize = 8 * 1024; //8KB 
     for(int destIx=1; destIx <= numSplits; destIx++) { 
      BufferedOutputStream bw = new BufferedOutputStream(new FileOutputStream("split."+destIx)); 
      if(bytesPerSplit > maxReadBufferSize) { 
       long numReads = bytesPerSplit/maxReadBufferSize; 
       long numRemainingRead = bytesPerSplit % maxReadBufferSize; 
       for(int i=0; i<numReads; i++) { 
        readWrite(raf, bw, maxReadBufferSize); 
       } 
       if(numRemainingRead > 0) { 
        readWrite(raf, bw, numRemainingRead); 
       } 
      }else { 
       readWrite(raf, bw, bytesPerSplit); 
      } 
      bw.close(); 
     } 
     if(remainingBytes > 0) { 
      BufferedOutputStream bw = new BufferedOutputStream(new FileOutputStream("split."+(numSplits+1))); 
      readWrite(raf, bw, remainingBytes); 
      bw.close(); 
     } 
      raf.close(); 
    } 

    static void readWrite(RandomAccessFile raf, BufferedOutputStream bw, long numBytes) throws IOException { 
     byte[] buf = new byte[(int) numBytes]; 
     int val = raf.read(buf); 
     if(val != -1) { 
      bw.write(buf); 
     } 
    } 
+5

它可能會中途拆分一條線,它對csv文件很重要 –

+0

有沒有一種方法克服這個?這樣它不會分裂中線? – Julian

+0

在我的公司我們有固定記錄大小爲每個列,我們填充到CSV,所以我們用一個記錄大小分割文件大小,然後我們分裂。同時讀取每行時在MQ上發送以插入,以便它是異步的。反正你的靈魂是好的。 –

0

有一個計數器來計算條目數量。假設每行有一個條目。步驟1:最初創建新的子文件,設置計數器= 0;

第二步:當計數器達到限制要在每個子文件中寫入條目數,緩衝區的內容刷新到子文件:當你閱讀從源文件中的每個條目來緩衝

第三步增加計數器。關閉子文件

第四步:跳轉到第一步,直到你在源文件中的數據從

0

看有沒有需要循環兩次通過文件。您可以估計每個塊的大小,作爲源文件大小除以所需塊的數量。然後你就停止用數據填充每個垃圾箱,因爲它的尺寸超出了估計的範圍。

5
import java.io.*; 
import java.util.Scanner; 
public class split { 
public static void main(String args[]) 
{ 
try{ 
    // Reading file and getting no. of files to be generated 
    String inputfile = "C:/test.txt"; // Source File Name. 
    double nol = 2000.0; // No. of lines to be split and saved in each output file. 
    File file = new File(inputfile); 
    Scanner scanner = new Scanner(file); 
    int count = 0; 
    while (scanner.hasNextLine()) 
    { 
    scanner.nextLine(); 
    count++; 
    } 
    System.out.println("Lines in the file: " + count);  // Displays no. of lines in the input file. 

    double temp = (count/nol); 
    int temp1=(int)temp; 
    int nof=0; 
    if(temp1==temp) 
    { 
    nof=temp1; 
    } 
    else 
    { 
    nof=temp1+1; 
    } 
    System.out.println("No. of files to be generated :"+nof); // Displays no. of files to be generated. 

    //--------------------------------------------------------------------------------------------------------- 

    // Actual splitting of file into smaller files 

    FileInputStream fstream = new FileInputStream(inputfile); DataInputStream in = new DataInputStream(fstream); 

    BufferedReader br = new BufferedReader(new InputStreamReader(in)); String strLine; 

    for (int j=1;j<=nof;j++) 
    { 
    FileWriter fstream1 = new FileWriter("C:/New Folder/File"+j+".txt");  // Destination File Location 
    BufferedWriter out = new BufferedWriter(fstream1); 
    for (int i=1;i<=nol;i++) 
    { 
    strLine = br.readLine(); 
    if (strLine!= null) 
    { 
    out.write(strLine); 
    if(i!=nol) 
    { 
     out.newLine(); 
    } 
    } 
    } 
    out.close(); 
    } 

    in.close(); 
}catch (Exception e) 
{ 
    System.err.println("Error: " + e.getMessage()); 
} 

} 

} 
+1

這不會做OP想要的(設置文件數量),但它做我想要的(設置行數)。好的代碼!將其修改爲一個函數,其中包含文件名並動態命名創建的文件。 –

+0

C&P from http://javaprogramming.language-tutorial.com/2012/10/split-huge-files-into-small-text-files.html? (博客條目從2012年開始) – bish

2

雖然它的一個老問題,但參考我列出了我用大文件分割到任何大小的代碼,並將其與工作任何高於1.4的Java版本。

樣品拆分和加入塊是象下面這樣:

public void join(String FilePath) { 
    long leninfile = 0, leng = 0; 
    int count = 1, data = 0; 
    try { 
     File filename = new File(FilePath); 
     //RandomAccessFile outfile = new RandomAccessFile(filename,"rw"); 

     OutputStream outfile = new BufferedOutputStream(new FileOutputStream(filename)); 
     while (true) { 
      filename = new File(FilePath + count + ".sp"); 
      if (filename.exists()) { 
       //RandomAccessFile infile = new RandomAccessFile(filename,"r"); 
       InputStream infile = new BufferedInputStream(new FileInputStream(filename)); 
       data = infile.read(); 
       while (data != -1) { 
        outfile.write(data); 
        data = infile.read(); 
       } 
       leng++; 
       infile.close(); 
       count++; 
      } else { 
       break; 
      } 
     } 
     outfile.close(); 
    } catch (Exception e) { 
     e.printStackTrace(); 
    } 
} 

public void split(String FilePath, long splitlen) { 
    long leninfile = 0, leng = 0; 
    int count = 1, data; 
    try { 
     File filename = new File(FilePath); 
     //RandomAccessFile infile = new RandomAccessFile(filename, "r"); 
     InputStream infile = new BufferedInputStream(new FileInputStream(filename)); 
     data = infile.read(); 
     while (data != -1) { 
      filename = new File(FilePath + count + ".sp"); 
      //RandomAccessFile outfile = new RandomAccessFile(filename, "rw"); 
      OutputStream outfile = new BufferedOutputStream(new FileOutputStream(filename)); 
      while (data != -1 && leng < splitlen) { 
       outfile.write(data); 
       leng++; 
       data = infile.read(); 
      } 
      leninfile += leng; 
      leng = 0; 
      outfile.close(); 
      count++; 
     } 
    } catch (Exception e) { 
     e.printStackTrace(); 
    } 
} 

可以在這裏File Split in Java Program鏈接完整的Java代碼。

+1

儘管此鏈接可能會回答問題,但最好在此處包含答案的重要部分,並提供供參考的鏈接。如果鏈接頁面更改,則僅鏈接答案可能會失效。 - [來自評論](/ review/low-quality-posts/12423371) – CubeJockey

+1

謝謝,更新了評論。 – user1472187

0

這是一個爲我工作,我用它來分裂10GB文件。它還使您能夠添加頁眉和頁腳。拆分基於文檔的格式(如XML和JSON)時非常有用,因爲您需要在新的拆分文件中添加文檔包裝。

import java.io.BufferedReader; 
import java.io.BufferedWriter; 
import java.io.File; 
import java.io.IOException; 
import java.nio.file.Files; 
import java.nio.file.Path; 
import java.nio.file.Paths; 
import java.nio.file.StandardOpenOption; 

public class FileSpliter 
{ 
    public static void main(String[] args) throws IOException 
    { 
     splitTextFiles("D:\\xref.csx", 750000, "", "", null); 
    } 

    public static void splitTextFiles(String fileName, int maxRows, String header, String footer, String targetDir) throws IOException 
    { 
     File bigFile = new File(fileName); 
     int i = 1; 
     String ext = fileName.substring(fileName.lastIndexOf(".")); 

     String fileNoExt = bigFile.getName().replace(ext, ""); 
     File newDir = null; 
     if(targetDir != null) 
     { 
      newDir = new File(targetDir);   
     } 
     else 
     { 
      newDir = new File(bigFile.getParent() + "\\" + fileNoExt + "_split"); 
     } 
     newDir.mkdirs(); 
     try (BufferedReader reader = Files.newBufferedReader(Paths.get(fileName))) 
     { 
      String line = null; 
      int lineNum = 1; 
      Path splitFile = Paths.get(newDir.getPath() + "\\" + fileNoExt + "_" + String.format("%02d", i) + ext); 
      BufferedWriter writer = Files.newBufferedWriter(splitFile, StandardOpenOption.CREATE); 
      while ((line = reader.readLine()) != null) 
      { 
       if(lineNum == 1) 
       { 
        System.out.print("new file created '" + splitFile.toString()); 
        if(header != null && header.length() > 0) 
        { 
         writer.append(header); 
         writer.newLine(); 
        } 
       } 
       writer.append(line); 

       if (lineNum >= maxRows) 
       { 
        if(footer != null && footer.length() > 0) 
        { 
         writer.newLine(); 
         writer.append(footer); 
        } 
        writer.close(); 
        System.out.println(", " + lineNum + " lines written to file"); 
        lineNum = 1; 
        i++; 
        splitFile = Paths.get(newDir.getPath() + "\\" + fileNoExt + "_" + String.format("%02d", i) + ext); 
        writer = Files.newBufferedWriter(splitFile, StandardOpenOption.CREATE); 
       } 
       else 
       { 
        writer.newLine(); 
        lineNum++; 
       } 
      } 
      if(lineNum <= maxRows) // early exit 
      { 
       if(footer != null && footer.length() > 0) 
       { 
        writer.newLine(); 
        lineNum++; 
        writer.append(footer); 
       } 
      } 
      writer.close(); 
      System.out.println(", " + lineNum + " lines written to file"); 
     } 

     System.out.println("file '" + bigFile.getName() + "' split into " + i + " files"); 
    } 
} 
0

下面的代碼用於將較大的文件分割成較小行的小文件。

long linesWritten = 0; 
    int count = 1; 

    try { 
     File inputFile = new File(inputFilePath); 
     InputStream inputFileStream = new BufferedInputStream(new FileInputStream(inputFile)); 
     BufferedReader reader = new BufferedReader(new InputStreamReader(inputFileStream)); 

     String line = reader.readLine(); 

     String fileName = inputFile.getName(); 
     String outfileName = outputFolderPath + "\\" + fileName; 

     while (line != null) { 
      File outFile = new File(outfileName + "_" + count + ".split"); 
      Writer writer = new OutputStreamWriter(new FileOutputStream(outFile)); 

      while (line != null && linesWritten < linesPerSplit) { 
       writer.write(line); 
       line = reader.readLine(); 
       linesWritten++; 
      } 

      writer.close(); 
      linesWritten = 0;//next file 
      count++;//nect file count 
     } 

     reader.close(); 

    } catch (Exception e) { 
     e.printStackTrace(); 
    } 
+0

我上面寫的代碼正在工作,我已經測試了一個具有40L記錄/行的文件。大約需要10秒將文件分割成每個文件1L行的縫隙。 –