Python - 文本處理

我想用Python來處理從.pdf刮掉的文本。Python - 文本處理

我試圖做到這一點的方法之一是：找到特定項目並打印相同的行，前面的行或後面的行。

我環顧四周，並遵循一些教程，讓我到這一點，但我不知道如何前進。

下面的代碼將使用「find」函數查找並打印當前行中的信息，但我需要能夠使用它來打印以下和之前的行。

即報廢看起來像這樣的文字：

史密斯，約翰

每尾12年12月12日

文件：

我使用的代碼是這樣的：

def main(): 
    file = open("Register.txt","r") 
    lines = file.readlines() 
    file.close 
    for line in lines: 
     line = line.strip() 
     countPerEnd = 0 
     countFile = 0 
     if line.find("Per End")!=-1: 
      countPerEnd = countPerEnd + 1 
     if line.find("File:")!=-1: 
      countFile = countFile + 1 
    print ("Per End: ", countPerEnd) 
    print ("File: ", countFile) 
main()

我只能得到我要打印的行，但需要他們能夠罰款其他項目，如在這種情況下的名稱和數字後面的「文件：」。

因爲這可以是任何事情，但字符串「Per End」和「file：」總是會一樣的。

我打印出結果以查看輸出的位置。

輸出是：每結束：12年12月12日

和輸出I，基於尋找「每尾」需要：史密斯，約翰

來源

2017-04-06 Jason Jabbour

你的問題不清楚....你發佈的文本是輸入或你在運行後得到的你創建的功能？你想幹什麼？從這段文字中提取「史密斯，約翰」，「12/12/12」和「12345」的值？ –

我添加了期望的結果。我希望它清除一些事情。 –

我不是100％肯定的你想做什麼，但我認爲這應該讓你在正確的軌道上：

lines = open("register.txt", "r").readlines() 

search_counters = { 
    "Per End": 0, 
    "File:": 0, 
} 

lines = [line for line in lines if line] # removes empty lines, if there are any 
for i, line in enumerate(lines): 
    for search_key in search_counters.keys(): 
     if search_key in line: 
      search_counters[search_key] += 1 
      # print the previous line if the current line contains "Per End": 
      if search_key == "Per End": 
       print "previous line:", lines[i-1]

來源

2017-04-07 13:12:56 smassey

所以這個效果很好。但只有一點。在添加了更多的IF語句之後，我發現某個變量是「未定義的」，即使它在Dictionary中設置，並且所有變量都以與其他項目相同的方式編碼。不知道我做錯了什麼。 –

Python - 文本處理

回答

相關問題