一個想法是,信息會更容易一點像CSV或TSV的標準格式解析。
就我個人而言,我發現一個基於正則表達式的解決方案來解析這種難以閱讀的輸入格式。現有的正則表達式的答案寫得很好,但只需要一個怪異的正則表達式,所以這不是我個人選擇解決這個特定問題的實現。我認爲在解析邏輯的基礎上增加一個更簡單的替代答案是很有益的,這些邏輯基於常用的str函數,如拆分,替換和刪除換行符和樣板文本。
我認爲這個問題也可以通過使用csv.reader並將其自定義分隔符設置爲'\n\n'
來解決,但是,無論出於何種原因,該函數僅支持1個字符的分隔符字符串。
您應該爲此示例添加一些內容,例如文件不存在時的異常處理,但核心在此解決手頭的問題。當然你也不需要封裝在課堂上,但對我來說很自然。
class Question:
def __init__(self, num, text, option_a, option_b, option_c, option_d,
answer, explanation):
self.num = num
self.text = text
self.option_a = option_a
self.option_b = option_b
self.option_c = option_c
self.option_d = option_d
self.answer = answer
self.explanation = explanation
@classmethod
def parse_input_file(cls, filename):
"""
Parse an input file of questions delimited by double newline.
Note: misspelling of "explanation" as "explaination" is intentionally
preserved from asker's question.
"""
with open(filename) as fp:
data = fp.read()
data = data.split('\n\n')
questions = []
for i in range(0, len(data), 8):
question_data = data[i:i+8]
question = cls(
num=int(question_data[0].lstrip('Q')),
text=question_data[1],
option_a=question_data[2].lstrip('A.').strip(),
option_b=question_data[3].lstrip('B.').strip(),
option_c=question_data[4].lstrip('C.').strip(),
option_d=question_data[5].lstrip('D.').strip(),
answer=question_data[6].replace('Answer: ', '', 1),
explanation=question_data[7].replace('Explaination: ', '', 1),
)
questions.append(question)
return questions
使用它的一個例子:
from pprint import pprint
questions = Question.parse_input_file('questions.txt')
for i in questions:
pprint(i.__dict__)
輸出:
{'answer': 'D',
'explanation': 'Policies are considered the first and highest level of '
'documentation, from which the lower level elements of '
'standards, procedures, and guidelines flow. This order, '
'however, does not mean that policies are more important than '
'the lower elements. These higher-level policies, which are '
'the more general policies and statements, should be created '
'first in the process for strategic reasons, and then the more '
'tactical elements can follow. -Ronald Krutz The CISSP PREP '
'Guide (gold edition) pg 13',
'num': 1,
'option_a': 'definition of the issue and statement of relevant terms.',
'option_b': 'statement of roles and responsibilities.',
'option_c': 'statement of applicability and compliance requirements.',
'option_d': 'statement of performance of characteristics and requirements.',
'text': 'All of the following are basic components of a security policy '
'EXCEPT the'}
{'answer': 'B',
'explanation': 'Procedures are looked at as the lowest level in the policy '
'chain because they are closest to the computers and provide '
'detailed steps for configuration and installation issues. '
'They provide the steps to actually implement the statements '
'in the policies, standards, and guidelines...Security '
'procedures, standards, measures, practices, and policies '
'cover a number of different subject areas. - Shon Harris '
'All-in-one CISSP Certification Guide pg 44-45',
'num': 2,
'option_a': 'Encryption Security',
'option_b': 'Procedural Security.',
'option_c': 'Logical Security',
'option_d': 'On-line Security',
'text': 'Ensuring the integrity of business information is the PRIMARY '
'concern of'}
{'answer': 'A',
'explanation': 'Information security policies area high-level plans that '
'describe the goals of the procedures. Policies are not '
'guidelines or standards, nor are they procedures or controls. '
'Policies describe security in general terms, not specifics. '
'They provide the blueprints for an overall security program '
'just as a specification defines your next product - Roberta '
'Bragg CISSP Certification Training Guide (que) pg 206\n',
'num': 3,
'option_a': 'Identifies major functional areas of information.',
'option_b': 'Quantifies the effect of the loss of the information.',
'option_c': 'Requires the identification of information owners.',
'option_d': 'Lists applications that support the business function.',
'text': 'Which one of the following is an important characteristic of an '
'information security policy?'}
你有什麼迄今所做?什麼不起作用? – cco
(a)用正則表達式做這件事會讓這個變得更加困難。相反,只需逐行閱讀文本文件,並使用每行的前幾個字符作爲指導將內容放入數據庫的字段。例如,以「答案」開頭的行顯然必須進入「答案」字段。 (b)在SO上,您需要向我們展示代碼,您*已經寫了關於如何糾正或按照自己的意願做出的具體問題。 –
對於這個正則表達式,你只需要重複你用來匹配A的模式,與其他...因此[正則表達式](https://regex101.com/r/yg075G/1)非常簡單...然而,我不會使用正則表達式,做比爾說的話,它應該更容易,沒有錯誤(如果他們不小心向你發送了一個不應該出現換行符的字符串,就會發生這種情況,等等。) – Mateus