如何在TCL中從頭開始讀取文件（以相反順序）？

我有一個非常大的文本文件，我必須從中提取一些數據。我逐行讀取文件並查找關鍵字。據我所知，我正在尋找的關鍵字比起始位置更接近文件末尾。我試圖TAC關鍵字集FH [開「| TAC文件名」] 我得到的錯誤是：無法執行「TAC」：沒有這樣的文件或目錄如何在TCL中從頭開始讀取文件（以相反順序）？

我的文件大小大，所以我不能夠將行存儲在一個循環中並再次將其反轉。請建議一些解決方案

來源

2017-04-03 user7145588

反轉文件的成本實際上相當高。我能想到的最佳選擇是構建行開頭的文件偏移列表，然後使用seek;gets模式遍歷該列表。

set f [open $filename] 

# Construct the list of indices 
set indices {} 
while {![eof $f]} { 
    lappend indices [tell $f] 
    gets $f 
} 

# Iterate backwards 
foreach idx [lreverse $indices] { 
    seek $f $idx 
    set line [gets $f] 

    DoStuffWithALine $line 
} 

close $f

這種方法的成本是不平凡的（即使你碰巧有索引的高速緩存，你仍然有問題吧），因爲它不與如何OS預很好地工作 - 獲取磁盤數據。

來源

2017-04-03 08:23:15

tac本身就是一個相當簡單的程序 - 你可以在Tcl中實現它的算法，至少如果你決定以相反的順序逐字讀取每一行。不過，我認爲這種約束並不是真的必要 - 你說你要找的內容比起開始時更接近尾聲，而不是你必須以相反的順序掃描線條。這意味着你可以做一些簡單的事情。粗略地說：

尋找文件末尾附近的偏移量。
像往常一樣逐行閱讀，直到你點擊你已經處理過的數據。
從文件尾部追溯偏移一點。
像往常一樣逐行閱讀，直到你點擊你已經處理過的數據。
等

這樣你實際上並沒有讓事情更加的內存比你現在正在處理單行，你會在文件月底前處理數據數據在文件的前面。也許你可以通過嚴格按照相反的順序處理這些線來提高性能，但是我相信，與從開始到結束不掃描獲得的優勢相比，這將是重要的。

下面是一些實現此算法的示例代碼。請注意避免處理部分線所採取的謹慎措施：

set BLOCKSIZE 16384 
set offset  [file size $filename] 
set lastOffset [file size $filename] 

set f [open $filename r] 
while { 1 } { 
    seek $f $offset 

    if { $offset > 0 } { 
     # We may have accidentally read a partial line, because we don't 
     # know where the line boundaries are. Skip to the end of whatever 
     # line we're in, and discard the content. We'll get it instead 
     # at the end of the _next_ block. 

     gets $f 
     set offset [tell $f] 
    } 

    while { [tell $f] < $lastOffset } { 
     set line [gets $f] 

     ### Do whatever you're going to do with the line here 

     puts $line 
    } 

    set lastOffset $offset 
    if { $lastOffset == 0 } { 
     # All done, we just processed the start of the file. 

     break 
    } 

    set offset [expr {$offset - $BLOCKSIZE}] 
    if { $offset < 0 } { 
     set offset 0 
    } 
} 
close $f

來源

2017-04-03 23:48:27

如何在TCL中從頭開始讀取文件（以相反順序）？

回答

相關問題