文件蟒蛇查找和刪除線3

-1

好吧，我得到了鎖這樣的文件：

我如何才能找到並刪除文件的只有這部分？

我一直在嘗試很多工作，但無法使其工作。

來源

2012-06-12 user1229391

那你嘗試做？ – Ryan

將每行讀入一串字符串。索引號是行號 - 1.在讀取行之前，檢查行是否等於「id：2」。如果是，則停止閱讀該行，直到該行等於「id：3」。讀完這行後，清除文件並將數組寫回文件直到數組結束。這可能不是最有效的方式，但應該起作用。

來源

2012-06-12 19:26:02 Manto

是用於決定刪除序列的標識，還是用於決策的值列表？

您可以構建一個字典，其中的id號是鍵（由於稍後的排序而轉換爲int），並將以下行轉換爲字符串列表，該字符串是該鍵的值。然後，您可以用鍵2刪除項目，然後遍歷按鍵排序的項目，並輸出新的id：鍵加上格式化的字符串列表。

或者您可以構建訂單受保護的列表的列表。如果要保護id的序列（即不重新編號），則還可以記住內部列表中的id：n。

這可以通過合理大小的文件來完成。如果文件很大，您應該將源複製到目標並即時跳過不需要的序列。最後一種情況對於小文件來說也相當容易。

[澄清後添加]

我建議瞭解以下方法，在很多這樣的情況下有用。它使用所謂的有限自動機，它實現了從一個狀態轉換到另一個狀態的動作（見Mealy machine）。

文本行是此處的輸入元素。代表上下文狀態的節點在這裏編號。（我的經驗是，不給他們名字是不值得的 - 讓他們只是愚蠢的數字。）這裏只使用了兩種狀態，並且status可以很容易地被布爾變量替換。但是，如果情況變得更加複雜，則會導致引入另一個布爾變量，並且代碼變得更容易出錯。

該代碼起初可能看起來非常複雜，但當您知道可以分別考慮每個if status == number時，它很容易理解。這是上述處理的上下文。不要試圖優化，讓代碼這樣。它實際上可以稍後進行人工解碼，並且您可以繪製類似於Mealy machine example的圖片。如果你這樣做，那就更容易理解了。

的有用的功能是廣義一點 - 一組被忽略的部分可以作爲第一個參數傳遞：

import re 

def filterSections(del_set, fname_in, fname_out): 
    '''Filtering out the del_set sections from fname_in. Result in fname_out.''' 

    # The regular expression was chosen for detecting and parsing the id-line. 
    # It can be done differently, but I consider it just fine and efficient. 
    rex_id = re.compile(r'^id:(\d+)\s*$') 

    # Let's open the input and output file. The files will be closed 
    # automatically. 
    with open(fname_in) as fin, open(fname_out, 'w') as fout: 
     status = 1     # initial status -- expecting the id line 
     for line in fin: 
      m = rex_id.match(line) # get the match object if it is the id-line 

      if status == 1:  # skipping the non-id lines 
       if m:    # you can also write "if m is not None:" 
        num_id = int(m.group(1)) # get the numeric value of the id 
        if num_id in del_set:  # if this id should be deleted 
         status = 1   # or pass (to stay in this status) 
        else: 
         fout.write(line)  # copy this id-line 
         status = 2   # to copy the following non-id lines 
       #else ignore this line (no code needed to ignore it :) 

      elif status == 2:  # copy the non-id lines 
       if m:       # the id-line found 
        num_id = int(m.group(1)) # get the numeric value of the id 
        if num_id in del_set:  # if this id should be deleted 
         status = 1   # or pass (to stay in this status) 
        else: 
         fout.write(line)  # copy this id-line 
         status = 2   # to copy the following non-id lines 
       else: 
        fout.write(line)   # copy this non-id line 


if __name__ == '__main__': 
    filterSections({1, 3}, 'data.txt', 'output.txt') 
    # or you can write the older set([1, 3]) for the first argument.

這裏那裏給原來的號碼輸入的ID線。如果您想對這些部分重新編號，可以通過簡單的修改來完成。試試代碼並詢問詳細信息。

請注意，有限自動機的功率有限。它們不能用於通常的編程語言，因爲它們無法捕獲嵌套的配對結構（如parenteses）。

P.S. 7000線實際上是從計算機角度看一個小文件;）

來源

2012-06-12 20:36:18 pepr

該標識符用於dscision刪除序列，該文件包含7.000行，因此它很大。對不起，我提供了那麼少的信息。 – user1229391

@ user1229391：刪除序列後，下一個序列是否應保留原始數字，還是應更正（減少）其ID？ – pepr

如果兩者之間存在干擾，這將不工作的任何值....

import fileinput 
... 
def deleteIdGroup(number): 
    deleted = False 
    for line in fileinput.input("testid.txt", inplace = 1): 
     line = line.strip('\n') 
     if line.count("id:" + number): # > 0 
      deleted = True; 
     elif line.count("id:"): # > 0 
      deleted = False; 
     if not deleted: 
      print(line)

編輯：

對不起這將刪除ID：2和ID：20 ...宥可以修改它，以便第一，如果檢查 - 線==「ID：」 +號

來源

2012-06-13 18:18:31 corn3lius

文件蟒蛇查找和刪除線3

回答

相關問題