Python中提取文本文件

我有幾個測試文件保存在一個目錄
我想去的每個文件和搜索一些文字「文本1」和「文本2」，並在本文的前面打印的一切輸出文件....
這我已經使用python腳本.....
但接下來的事情是我只想在每個文件中的「文本1」和「文本2」的第一個實例。如果我在當前的腳本中添加break我不能夠在出文件打印..

請指導我..我是一個Python初學者...Python中提取文本文件

import os 
path = "D:\test" 
in_files = os.listdir(path) 
desc = open("desc.txt", "w") 
print >> desc, "Mol_ID, Text1, Text2" 
moldesc = ['Text1', 'Text2'] 
for f in in_files: 
    file = os.path.join(path, f) 
    text = open(file, "r") 
    hit_count = 0 
    hit_count1 = 0 
    for line in text: 
     if moldesc[0] in line: 
      Text1 = line.split()[-1] 
     if moldesc[1] in line: 
      Text2 = line.split()[-1] 
      print >> desc, f + "," + Text1 + "," + Text2 
text.close() 
print "Text extraction done !!!"

來源

2012-09-25 nilesh

因你想要兩個還是其中一個的第一個實例？ –

我很難理解你的問題。能否請您提供一個樣本輸入你想有輸出？ – devsnd

爲什麼不使用find，xargs，grep和sed？ – njzk2

有一對夫婦的您的代碼問題：

您的text.close()應該與for line in text循環處於同一水平。
的print >> desc話說出來的地方：你應該打印僅當Text1和Text2定義。你可以在for line in text循環之外將它們設置爲None，並測試它們是否都不是None。（或者，您也可以設置在if moldesc[0]測試hit_count0=1，hit_count1=1在和測試hit_count0 and hit_count1）。在這種情況下，打印輸出並使用break來避開循環。

（所以，在明碼:)

for f in in_files: 
    file = os.path.join(path, f) 
    with open(file, "r") as text: 
     hit_count = 0 
     hit_count1 = 0 
     for line in text: 
      if moldesc[0] in line: 
       Text1 = line.split()[-1] 
       hit_count = 1 
      if moldesc[1] in line: 
       Text2 = line.split()[-1] 
       hit_count1 = 1 
      if hit_count and hit_count1: 
       print >> desc, f + "," + Text1 + "," + Text2 
       break

還有第三個問題：

Text1之前你提到希望文本？然後，你可能想使用Text1 = line[:line.index(moldesc[0])]，而不是你Text1 = line.split()[-1] ...

來源

2012-09-25 12:03:40

我在with語句獲取語法錯誤??? – nilesh

我會去爲mmap，並可能使用CSV的結果文件的做法，有點像（未經測試）和粗糙周圍的邊緣...（需要更好錯誤處理，可能要使用mm.find（）而不是一個正則表達式，有些代碼是從OP等...，逐字複製，我的電腦的電池將死......）

import os 
import csv 
import mmap 
from collections import defaultdict 

PATH = r"D:\test" # note 'r' prefix to escape '\t' interpretation 
in_files = os.listdir(path) 

fout = open('desc.txt', 'w') 
csvout = csv.writer(fout) 
csvout.writerow(['Mol_ID', 'Text1', 'Text2']) 

dd = defaultdict(list) 

for filename in in_files: 
    fin = open(os.path.join(path, f)) 
    mm = mmap.mmap(fin.fileno(), 0, access=mmap.ACCESS_READ) 
    # Find stuff 
    matches = re.findall(r'(.*?)(Text[12])', mm) # maybe user finditer depending on exact needs 
    for text, matched in matches: 
     dd[matched].append(text) 
    # do something with dd - write output using csvout.writerow()... 
    mm.close() 
    fin.close() 
csvout.close()

來源

2012-09-25 12:19:08

Python中提取文本文件

回答

相關問題