這是一個相當耗費時間和資源,有效的方法讀出此值,並計算所有文件並行的平均水平,但只有在每個文件一行一次讀取 - 但它確實暫時讀整個第一個.dat
文件到內存中,以確定每個文件中將有多少行和列的數字。
你沒有說如果你的「數字」是整數或浮點或什麼,所以這讀取他們作爲浮點(即使他們不工作,這將工作)。無論如何,平均值將被計算並輸出爲浮點數。
更新
我修改我原來的答覆也算值的總體標準偏差(sigma
)每行和每列中,按您的評論。它在計算它們的平均值後立即執行此操作,因此不必重新讀取所有數據。另外,爲了響應評論中的建議,添加了上下文管理器以確保所有輸入文件都被關閉。
請注意,標準偏差只是打印出來的,並沒有寫入輸出文件,但是對同一個文件或單獨的文件應該很容易添加。
from contextlib import contextmanager
from itertools import izip
from glob import iglob
from math import sqrt
from sys import exit
@contextmanager
def multi_file_manager(files, mode='rt'):
files = [open(file, mode) for file in files]
yield files
for file in files:
file.close()
# generator function to read, convert, and yield each value from a text file
def read_values(file, datatype=float):
for line in file:
for value in (datatype(word) for word in line.split()):
yield value
# enumerate multiple egual length iterables simultaneously as (i, n0, n1, ...)
def multi_enumerate(*iterables, **kwds):
start = kwds.get('start', 0)
return ((n,)+t for n, t in enumerate(izip(*iterables), start))
DATA_FILE_PATTERN = 'data*.dat'
MIN_DATA_FILES = 2
with multi_file_manager(iglob(DATA_FILE_PATTERN)) as datfiles:
num_files = len(datfiles)
if num_files < MIN_DATA_FILES:
print('Less than {} .dat files were found to process, '
'terminating.'.format(MIN_DATA_FILES))
exit(1)
# determine number of rows and cols from first file
temp = [line.split() for line in datfiles[0]]
num_rows = len(temp)
num_cols = len(temp[0])
datfiles[0].seek(0) # rewind first file
del temp # no longer needed
print '{} .dat files found, each must have {} rows x {} cols\n'.format(
num_files, num_rows, num_cols)
means = []
std_devs = []
divisor = float(num_files-1) # Bessel's correction for sample standard dev
generators = [read_values(file) for file in datfiles]
for _ in xrange(num_rows): # main processing loop
for _ in xrange(num_cols):
# create a sequence of next cell values from each file
values = tuple(next(g) for g in generators)
mean = float(sum(values))/num_files
means.append(mean)
means_diff_sq = ((value-mean)**2 for value in values)
std_dev = sqrt(sum(means_diff_sq)/divisor)
std_devs.append(std_dev)
print 'Average and (standard deviation) of values:'
with open('means.txt', 'wt') as averages:
for i, mean, std_dev in multi_enumerate(means, std_devs):
print '{:.2f} ({:.2f})'.format(mean, std_dev),
averages.write('{:.2f}'.format(mean)) # note std dev not written
if i % num_cols != num_cols-1: # not last column?
averages.write(' ') # delimiter between values on line
else:
print # newline
averages.write('\n')
文件平均值的含義不明顯。請發佈示例文件。 –
將每個文件看作是相同尺寸的矩陣。我想確定矩陣中任何給定位置上所有數字的均值,然後將平均值置於此位置的平均值矩陣中。 – user1757550
你有什麼特別的麻煩? – jdi