2016-10-18 26 views
2

該問題的基本概述是讀取文件,使用re.findall()查找整數,查找[0-9]+的正則表達式,然後將提取的字符串轉換爲整數併合計整數。使用正則表達式提取數據:Python

我在追加列表中遇到麻煩。從我的下面的代碼,它只是追加行的第一個(0)索引。請幫幫我。謝謝。

import re 
hand = open ('a.txt') 
lst = list() 
for line in hand: 
    line = line.rstrip() 
    stuff = re.findall('[0-9]+', line) 
    if len(stuff)!= 1 : continue 
    num = int (stuff[0]) 
    lst.append(num) 
print sum(lst) 
+0

你可以顯示「a.txt」中的一些行嗎? – mitoRibo

+0

感謝您的回覆。下面的鏈接將重定向到整個文件中的文本。 http://python-data.dr-chuck.net/regex_sum_325354.txt –

回答

0

很好,謝謝你包括整個txt文件!你的主要問題是if len(stuff)...這條線跳過,如果stuff沒有東西和它有2,3等等。您只保留stuff長度爲1的列表。我在代碼中添加了註釋,但如果有不明之處,請提出任何問題。

import re 
hand = open ('a.txt') 
str_num_lst = list() 
for line in hand: 
    line = line.rstrip() 
    stuff = re.findall('[0-9]+', line) 
    #If we didn't find anything on this line then continue 
    if len(stuff) == 0: continue 
    #if len(stuff)!= 1: continue #<-- This line was wrong as it skip lists with more than 1 element 

    #If we did find something, stuff will be a list of string: 
    #(i.e. stuff = ['9607', '4292', '4498'] or stuff = ['4563']) 
    #For now lets just add this list onto our str_num_list 
    #without worrying about converting to int. 
    #We use '+=' instead of 'append' since both stuff and str_num_lst are lists 
    str_num_lst += stuff 

#Print out the str_num_list to check if everything's ok 
print str_num_lst 

#Get an overall sum by looping over the string numbers in the str_num_lst 
#Can convert to int inside the loop 
overall_sum = 0 
for str_num in str_num_lst: 
    overall_sum += int(str_num) 

#Print sum 
print 'Overall sum is:' 
print overall_sum 

編輯:

你是對的,在整個文件中讀取爲一條線是一個很好的解決方案,這是不難做到。檢查出this post。這是代碼的樣子。

import re 

hand = open('a.txt') 
all_lines = hand.read() #Reads in all lines as one long string 
all_str_nums_as_one_line = re.findall('[0-9]+',all_lines) 
hand.close() #<-- can close the file now since we've read it in 

#Go through all the matches to get a total 
tot = 0 
for str_num in all_str_nums_as_one_line: 
    tot += int(str_num) 

print 'Overall sum is:',tot 
+0

非常感謝。如果len(東西)......線,我知道我做錯了。我無法弄清楚這個問題。 '+ ='是正確的選項。感謝分享。作爲入門級程序員,我想知道我們是否可以將整個文件作爲單個字符串讀取(?),並使用字符串中的[[0-9] +'來提取? –

+0

是的,好點!我編輯了我的答案,以包含該選項 – mitoRibo

+0

那很好。非常感謝。 –