2013-01-24 13 views
0

編寫一個程序以提示輸入文件名,然後通讀文件並查找表格行: X-DSPAM-Confidence:0.8475 當您遇到以「X-DSPAM-Confidence: 「將行分開以提取行上的浮點數。對這些行進行計數,並計算這些行中垃圾郵件可信度值的總數。當您到達文件末尾時,打印出平均垃圾郵件信心。Python解析通過使用for循環的行嗎?

輸入文件名:mbox.txt
平均垃圾郵件可信度:0.894128046745

輸入文件名:MBOX-short.txt
平均垃圾郵件可信度:0.750718518519 測試你對mbox.txt文件mbox-short.txt文件。

到目前爲止,我有:

fname = raw_input("Enter file name: ") 
fh = open(fname) 
for line in fh: 
    pos = fh.find(':0.750718518519') 
    x = float(fh[pos:]) 
    print x 

什麼是錯的代碼?

回答

4

這聽起來像是他們要求您平均所有'X-DSPAM-Confidence'數字,而不是找到0.750718518519

就我個人而言,我會找到您要查找的單詞,提取數字,然後將所有這些數字放入列表中,並在最後對其進行平均。

事情是這樣的 -

# Get the filename from the user 
filename = raw_input("Enter file name: ") 

# An empty list to contain all our floats 
spamflts = [] 

# Open the file to read ('r'), and loop through each line 
for line in open(filename, 'r'): 

    # If the line starts with the text we want (with all whitespace stripped) 
    if line.strip().startswith('X-DSPAM-Confidence'): 

     # Then extract the number from the second half of the line 
     # "text:number".split(':') will give you ['text', 'number'] 
     # So you use [1] to get the second half 
     # Then we use .strip() to remove whitespace, and convert to a float 
     flt = float(line.split(':')[1].strip()) 

     print flt 

     # We then add the number to our list 
     spamflts.append(flt) 

print spamflts 
# At the end of the loop, we work out the average - the sum divided by the length 
average = sum(spamflts)/len(spamflts) 

print average 

>>> lines = """X-DSPAM-Confidence: 1 
X-DSPAM-Confidence: 5 
Nothing on this line 
X-DSPAM-Confidence: 4""" 

>>> for line in lines.splitlines(): 
    print line 


X-DSPAM-Confidence: 1 
X-DSPAM-Confidence: 5 
Nothing on this line 
X-DSPAM-Confidence: 4 

使用find:

>>> for line in lines.splitlines(): 
    pos = line.find('X-DSPAM-Confidence:') 
    print pos 

0 
0 
-1 
0 

我們可以看到,find()只是給我們的'X-DSPAM-Confidence:'每行中的位置,沒有位置之後的數字。

很容易找到,如果符合'X-DSPAM-Confidence:'開始,然後解壓就這樣數:

>>> for line in lines.splitlines(): 
    print line.startswith('X-DSPAM-Confidence') 


True 
True 
False 
True 

>>> for line in lines.splitlines(): 
    if line.startswith('X-DSPAM-Confidence'): 
     print line.split(':') 


['X-DSPAM-Confidence', ' 1'] 
['X-DSPAM-Confidence', ' 5'] 
['X-DSPAM-Confidence', ' 4'] 

>>> for line in lines.splitlines(): 
    if line.startswith('X-DSPAM-Confidence'): 
     print float(line.split(':')[1]) 


1.0 
5.0 
4.0 
+0

嘿,你只是做了他的任務他幾乎:P(仍然+1給了一個正確的答案) –

+0

@JoranBeasley是的,我想是的。這可能不是學習的最佳方式,但希望他能通讀並嘗試理解它。 (有任何更多的功能標籤沒有?) –

+0

@JoranBeasley我已經添加了一些意見,應該會幫助他了解它 –

-1

line.find#.....所以你搜索行....

print pos #prints幫助調試;)

float(fh[pos+1:])你有#the指數實際上是:所以你需要移動超過1個

0
fname = raw_input("Enter file name: ") 
with open(fname) as f: 
    spam = [float(line.split(':', 1)[1]) for line in f if line.startswith('X-DSPAM-Confidence: ')] 
count = len(spam) 
avg = sum(spam)/len(spam)