如何處理這個文本文件並解析我需要的內容？

我試圖從Python doctest模塊解析輸出並將其存儲在HTML文件中。如何處理這個文本文件並解析我需要的內容？

我有相似的輸出：

********************************************************************** 
File "example.py", line 16, in __main__.factorial 
Failed example: 
    [factorial(n) for n in range(6)] 
Expected: 
    [0, 1, 2, 6, 24, 120] 
Got: 
    [1, 1, 2, 6, 24, 120] 
********************************************************************** 
File "example.py", line 20, in __main__.factorial 
Failed example: 
    factorial(30) 
Expected: 
    25252859812191058636308480000000L 
Got: 
    265252859812191058636308480000000L 
********************************************************************** 
1 items had failures: 
    2 of 8 in __main__.factorial 
***Test Failed*** 2 failures.

每個失敗是由星號線，其限定互相每個測試失敗的前面。

我想要做的是去掉失敗的文件名和方法，以及預期的和實際的結果。然後我想用這個創建一個HTML文檔（或者將它存儲在一個文本文件中，然後進行第二輪解析）。

我該如何使用Python或UNIX shell實用程序的組合來完成此操作？

編輯：我制定了以下shell腳本匹配每個塊如何我想，但我不確定如何將每個sed匹配重定向到它自己的文件。

python example.py | sed -n '/.*/,/^\**$/p' > `mktemp error.XXX`

來源

2009-08-07 samoz

如果剝離文件，方法，預期結果和實際結果，剩下的是什麼？ – juanjux 2009-08-07 20:20:22

嗯，我只是無法解析他們到單獨的塊，因爲到目前爲止，我只能一次抓住整個塊，而不是單個字段。 – samoz 2009-08-07 20:26:12

這是一個快速和骯髒的腳本解析輸出與相關信息的元組：

import sys 
import re 

stars_re = re.compile('^[*]+$', re.MULTILINE) 
file_line_re = re.compile(r'^File "(.*?)", line (\d*), in (.*)$') 

doctest_output = sys.stdin.read() 
chunks = stars_re.split(doctest_output)[1:-1] 

for chunk in chunks: 
    chunk_lines = chunk.strip().splitlines() 
    m = file_line_re.match(chunk_lines[0]) 

    file, line, module = m.groups() 
    failed_example = chunk_lines[2].strip() 
    expected = chunk_lines[4].strip() 
     got = chunk_lines[6].strip() 

    print (file, line, module, failed_example, expected, got)

來源

2009-08-07 21:11:11

你可以寫一個Python程序除了挑這個，但也許一個更好的事情將考慮修改文檔測試輸出，你首先需要的報告。從文檔的doctest.DocTestRunner：

        ... the display output 
can be also customized by subclassing DocTestRunner, and 
overriding the methods `report_start`, `report_success`, 
`report_unexpected_exception`, and `report_failure`.

來源

2009-08-07 21:10:35

我一定會看看這個！ – samoz 2009-08-07 22:20:50

我pyparsing做到這一點寫了一個快速的解析器。

from pyparsing import * 

str = """ 
********************************************************************** 
File "example.py", line 16, in __main__.factorial 
Failed example: 
    [factorial(n) for n in range(6)] 
Expected: 
    [0, 1, 2, 6, 24, 120] 
Got: 
    [1, 1, 2, 6, 24, 120] 
********************************************************************** 
File "example.py", line 20, in __main__.factorial 
Failed example: 
    factorial(30) 
Expected: 
    25252859812191058636308480000000L 
Got: 
    265252859812191058636308480000000L 
********************************************************************** 
""" 

quote = Literal('"').suppress() 
comma = Literal(',').suppress() 
in_ = Keyword('in').suppress() 
block = OneOrMore("**").suppress() + \ 
     Keyword("File").suppress() + \ 
     quote + Word(alphanums + ".") + quote + \ 
     comma + Keyword("line").suppress() + Word(nums) + comma + \ 
     in_ + Word(alphanums + "._") + \ 
     LineStart() + restOfLine.suppress() + \ 
     LineStart() + restOfLine + \ 
     LineStart() + restOfLine.suppress() + \ 
     LineStart() + restOfLine + \ 
     LineStart() + restOfLine.suppress() + \ 
     LineStart() + restOfLine 

all = OneOrMore(Group(block)) 

result = all.parseString(str) 

for section in result: 
    print section

給

['example.py', '16', '__main__.factorial', ' [factorial(n) for n in range(6)]', ' [0, 1, 2, 6, 24, 120]', ' [1, 1, 2, 6, 24, 120]'] 
['example.py', '20', '__main__.factorial', ' factorial(30)', ' 25252859812191058636308480000000L', ' 265252859812191058636308480000000L']

來源

2009-08-07 21:43:23

非常好的工作！我想我會玩這個... – samoz 2009-08-07 22:21:24

爲什麼str在文本前後有3個「標記？對不起，我的Python確實不是那麼好 – samoz 2009-08-08 20:23:13

三個引號只是表示一個文本字符串，可以超過多個線。 – 2009-08-08 22:23:39

這可能是我寫過的最優雅的Python腳本之一，但它應有的框架，做你想做的，而不訴諸UNIX實用程序和單獨的腳本創建html。它沒有經過測試，但應該只需稍作調整即可工作。

import os 
import sys 

#create a list of all files in directory 
dirList = os.listdir('') 

#Ignore anything that isn't a .txt file. 
# 
#Read in text, then split it into a list. 
for thisFile in dirList: 
    if thisFile.endswith(".txt"): 
     infile = open(thisFile,'r') 

     rawText = infile.read() 

     yourList = rawText.split('\n') 

     #Strings 
     compiledText = '' 
     htmlText = '' 

     for i in yourList: 

      #clunky way of seeing whether or not current line 
      #should be included in compiledText 

      if i.startswith("*****"): 
       compiledText += "\n\n--- New Report ---\n" 

      if i.startswith("File"): 
       compiledText += i + '\n' 

      if i.startswith("Fail"): 
       compiledText += i + '\n' 

      if i.startswith("Expe"): 
       compiledText += i + '\n' 

      if i.startswith("Got"): 
       compiledText += i + '\n' 

      if i.startswith(" "): 
       compiledText += i + '\n' 


    #insert your HTML template below 

    htmlText = '<html>...\n <body> \n '+htmlText+'</body>... </html>' 


    #write out to file 
    outfile = open('processed/'+thisFile+'.html','w') 
    outfile.write(htmlText) 
    outfile.close()

來源

2009-08-07 22:28:11 Sean

如何處理這個文本文件並解析我需要的內容？

回答

相關問題