的Python只在文本文件中的特定位置

我有一個包含這樣的Python只在文本文件中的特定位置

AA 331    
line1 ... 
line2 ...  
% information here  
AA 332 
line1 ...  
line2 ...  
line3 ... 
%information here  
AA 1021 
line1 ... 
line2 ... 
% information here  
AA 1022  
line1 ... 
% information here  
AA 1023  
line1 ...  
line2 ...  
% information here

我想只爲來之後是行後最小整數「信息」進行操作的數據的文本文件執行操作"AA 331"和線"AA 1021"和不經過線"AA 332"，"AA 1022"和"AA 1023"。

詩這是大文件只是一個樣本數據

下面的代碼我嘗試分析文本文件，並得到它們之後的列表「列表1」，「AA」，並在第二個功能我組的整數他們在「list2」中獲得最小的價值。這將返回像[331,1021，...]的整數。所以我想到了提取「AA 331」後面的行，然後執行操作，但我知道如何繼續。

from itertools import groupby 
def getlineindex(textfile): 
    with open(textfile) as infile: 
    list1 = [] 
    for line in infile : 
     if line.startswith("AA"): 
      intid = line[3:] 
      list1.append(intid) 
    return list1 

def minimalinteger(list1): 
    list2 = [] 
    for k,v in groupby(list1,key=lambda x: x//10): 
      minimalint = min(v) 
      list2.append(minimalint) 
    return list2

列表2包含了來自「AA」之後的最小整數[331,1021，..]

來源

2015-02-06 Danira

我認爲你的問題可以使用一些澄清。你指定的行後面的'最小整數'是多少？在哪一行發生，並且該位置是否一致/可靠？此外，你是如何提出'AA 331'和'AA 1021'作爲你想要處理的數據的指標的？這是你期望作爲人類輸入接受的東西，還是有計算方式來確定它？ – bmhkim 2015-02-06 21:49:37

我的意思是331 <332和1021 <1022 <1023的最小整數。所以我想要在AA 331塊之後的「％information」行執行操作，該行在AA 332之前。 – Danira 2015-02-06 21:53:32

當然，您會注意到331 <332 <1021 <1022 <1023.那麼爲什麼要處理1021？ – bmhkim 2015-02-06 22:00:27

好了，這裏是我的解決方案。在高層次上，我一行一行地去看，看AA線，知道我什麼時候找到了數據塊的開始/結束，並觀察我所稱的運行號碼，以確定我們是否應該處理下一個塊。然後，我有一個處理任何給定塊的子程序，基本上讀取所有相關行並根據需要處理它們。該子程序是爲了知道何時完成而注意的下一個 AA行。

import re 

runIdRegex = re.compile(r'AA (\d+)') 

def processFile(fileHandle): 
    lastNumber = None # Last run number, necessary so we know if there's been a gap or if we're in a new block of ten. 
    line = fileHandle.next() 
    while line is not None: # None is being used as a special value indicating we've hit the end of the file. 
     processData = False 
     match = runIdRegex.match(line) 
     if match: 
      runNumber = int(match.group(1)) 
      if lastNumber == None: 
       # Startup/first iteration 
       processData = True 
      elif runNumber - lastNumber == 1: 
       # Continuation, see if the tenths are the same. 
       lastNumberTens = lastNumber/10 
       runNumberTens = runNumber/10 
       if lastNumberTens != runNumberTens: 
        processData = True 
      else: 
       processData = True 

      # Always remember where we were. 
      lastNumber = runNumber 

      # And grab and process data. 
      line = dataBlock(fileHandle, process=processData) 
     else: 
      try: 
       line = fileHandle.next() 
      except StopIteration: 
       line = None 

def dataBlock(fileHandle, process=False): 
    runData = [] 
    try: 
     line = fileHandle.next() 
     match = runIdRegex.match(line) 
     while not match: 
      runData.append(line) 
      line = fileHandle.next() 
      match = runIdRegex.match(line) 
    except StopIteration: 
     # Hit end of file 
     line = None 

    if process: 
     # Data processing call here 
     # processData(runData) 
     pass 

    # Return line so we don't lose it! 
    return line

一些筆記給你。首先，我與吉米蓮達成一致意見，你應該使用正則表達式來匹配AA線。

其次，我們談到了關於我們應該處理的數據是在processFile邏輯。具體來說這些行：

 processData = False 
     match = runIdRegex.match(line) 
     if match: 
      runNumber = int(match.group(1)) 
      if lastNumber == None: 
       # Startup/first iteration 
       processData = True 
      elif runNumber - lastNumber == 1: 
       # Continuation, see if the tenths are the same. 
       lastNumberTens = lastNumber/10 
       runNumberTens = runNumber/10 
       if lastNumberTens != runNumberTens: 
        processData = True 
      else: 
       processData = True

我假設我們不想處理數據，然後確定我們什麼時候做。從邏輯上講，你可以做相反的事情，假設你想要處理數據，然後確定何時不需要。接下來，我們需要存儲最近的運行的值，以便知道我們是否需要處理此運行的數據。（並注意第一次運行邊緣情況）我們知道當序列被破壞（兩次運行之間的差異大於1）時我們想要處理數據，這由else語句處理。我們也知道，我們要處理數據時的順序遞增十位，這是我的整數除以10

三處理的數字，注意從數據塊返回數據的。如果你不這樣做，你將失去導致dataBlock停止迭代的AA行，並且processFile需要該行來確定是否應該處理下一個數據塊。

最後，我選擇使用fileHandle.next（），當我到了文件的末尾異常處理來識別。但不要認爲這是唯一的方法。 :)

讓我知道意見，如果您有任何問題。

來源

2015-02-06 23:42:33 bmhkim

完美解釋我的問題。好的解決方案我將使用我的樣本數據並讓您知道。無論如何，這是我想如此接受。非常感謝您的時間:-) – Danira 2015-02-07 21:53:22

您可以使用類似：

import re 

matcher = re.compile("AA ([\d]+)") 
already_was = [] 
good_block = False 

with open(filename) as f: 
    for line in f: 
     m = matcher.match(line) 
     if m: 
      v = int(m.groups(0))/10 
     else: 
      v = None 

     if m and v not in already_was: 
      good_block = True 
      already_was.append(m) 
     if m and v in already_was: 
      good_block = False 
     if not m and good_block: 
      do_action()

這些代碼的工作只有在第一個值在小組中是最小的一個。

來源

2015-02-06 21:35:35 Jimilian

是的，我在你的答案後編輯了我的問題。非常感謝幫助:-) – Danira 2015-02-07 21:55:05

的Python只在文本文件中的特定位置

回答

相關問題