2010-12-21 13 views
1

我有一個功能,從文件中刪除線。我正在處理大文件(超過100Mb)。我有256MB的PHP內存,但是用100MB的CSV文件處理帶外線條的功能。PHP性能不佳。隨着大文件內存爆炸!我該如何重構?

什麼功能必須做的是這樣的:

本來我有CSV這樣的:

Copyright (c) 2007 MaxMind LLC. All Rights Reserved. locId,country,region,city,postalCode,latitude,longitude,metroCode,areaCode 1,"O1","","","",0.0000,0.0000,, 2,"AP","","","",35.0000,105.0000,, 3,"EU","","","",47.0000,8.0000,, 4,"AD","","","",42.5000,1.5000,, 5,"AE","","","",24.0000,54.0000,, 6,"AF","","","",33.0000,65.0000,, 7,"AG","","","",17.0500,-61.8000,, 8,"AI","","","",18.2500,-63.1667,, 9,"AL","","","",41.0000,20.0000,,

當我通過CSV文件到這個功能我:

locId,country,region,city,postalCode,latitude,longitude,metroCode,areaCode 1,"O1","","","",0.0000,0.0000,, 2,"AP","","","",35.0000,105.0000,, 3,"EU","","","",47.0000,8.0000,, 4,"AD","","","",42.5000,1.5000,, 5,"AE","","","",24.0000,54.0000,, 6,"AF","","","",33.0000,65.0000,, 7,"AG","","","",17.0500,-61.8000,, 8,"AI","","","",18.2500,-63.1667,, 9,"AL","","","",41.0000,20.0000,,

它只有去掉第一行,沒有更多。問題在於大文件的這種功能的性能,它打開了內存。

功能是:

public function deleteLine($line_no, $csvFileName) { 

    // this function strips a specific line from a file 
    // if a line is stripped, functions returns True else false 
    // 
    // e.g. 
    // deleteLine(-1, xyz.csv); // strip last line 
    // deleteLine(1, xyz.csv); // strip first line 

    // Assigna o nome do ficheiro 
    $filename = $csvFileName; 

    $strip_return=FALSE; 

    $data=file($filename); 
    $pipe=fopen($filename,'w'); 
    $size=count($data); 

    if($line_no==-1) $skip=$size-1; 
    else $skip=$line_no-1; 

    for($line=0;$line<$size;$line++) 
    if($line!=$skip) 
    fputs($pipe,$data[$line]); 
    else 
    $strip_return=TRUE; 

    return $strip_return; 
} 

有可能重構這個功能不與256MB PHP內存炸掉?

給我一些線索。

此致

+1

你可以先用file()將ENTIRE文件讀入內存。逐行閱讀,並在完成一行時回寫到磁盤。 – Craige 2010-12-21 16:55:44

+0

當你有自己的服務器時,考慮用`perl`或`sed` ;-)來做這件事。 – thedom 2010-12-21 17:00:13

回答

2

你井噴的問題是file函數,將整個文件存入內存。爲了克服這個問題,你需要逐行讀取文件,將除了要刪除的行全部寫入臨時文件,最後重命名臨時文件。

public function deleteLine($line_no, $csvFileName) { 

     // get a temp file name in current working directory..you can use 
     // any other directory say /tmp 
     $tmpFileName = tempnam(".", "csv"); 

     $strip_return=FALSE; 

     // open input file for reading. 
     $readFD=fopen($csvFileName,'r'); 

     // temp file for writing. 
     $writeFD=fopen($tmpFileName,'w'); 

     // check for fopen errors. 

     if($line_no==-1) { 
       $skip=$size-1; 
     } else { 
       $skip=$line_no-1; 
     } 

     $line = 0; 

     // read lines from input file one by one. 
     // write all lines except the line to be deleted. 
     while (($buffer = fgets($readFD)) !== false) { 
       if($line!=$skip) 
         fputs($writeFD,$buffer); 
       else 
         $strip_return=TRUE; 
       $line++; 
     } 

     // rename temp file to input file.  
     rename($tmpFileName,$csvFileName); 

     return $strip_return; 
} 
0

file()方法讀取整個文件到一個數組,一次全部。我會想象這是事情發生的地方。您可能需要爲輸入文件設置第二個fopen()句柄,以便一次讀取一行。

如果你的要求是用PHP來處理這個任務,那很好。但這種類型的東西可能更好,就像awk

1

好吧,最簡單的答案是不要用PHP來做。認真地說,sed會更好,因爲整個文件永遠不會在內存中。看看這些oneliners,但本質:

sed '1d' filename 

我知道系統調用令人難以接受的,但我覺得這可能是當一個是必要的情況下。