我有一個大的文本文件拆分一個大的文本文件,它看起來像這樣:的Python:許多頭
lat lon altitude pressure
3 lines data group bsas
2.3 4.5 45.0 875
5.6 6.5 46.2 676
3.4 3.4 48.2 565
6 lines data group sdad
3.4 4.5 56.1 535
5.6 6.5 46.2 676
3.4 4.5 56.1 535
2.3 4.5 45.0 875
5.6 6.5 46.2 676
3.4 3.4 48.2 565
50 lines data group asdasd
5.5 6.6 44.5 343
...
3.7 8.4 56.5 456
... and so on
我想要分割整個文本文件中單獨的數據組,每個數據組將存儲在二維數組中。直到現在我已經嘗試了兩種方式來做到這一點。
第一種方式正在經歷的每一行,並得到數據如下:
# define an object class called Wave here
# each object has 4 attributes: lat, lon, altitude, pressure
wave_list = []
with open(filename, 'r') as f:
next(f) # skip the header
wave = Wave()
for i, line in enumerate(f):
if 'data' in line:
if wave is not empty:
wave_list.append(wave)
wave = Wave()
else:
wave.lat.append(line.split()[0])
wave.lon.append(line.split()[1])
wave.altitude.append(line.split()[2])
wave.pressure.append(line.split()[3])
wave_list.append(wave)
return wave_list
第二種方法是使用numpy的loadtext:
f = open(filename, 'r')
txt = f.read()
# split by "data", remove the first element
raw_chunks = txt.split("data")[1:]
# define a new list to store results
wave_list = []
# go through each chunk
for rc in raw_chunks:
# find the fisrt index of "\n"
first_id = rc.find("\n")
# find the last index of "\n"
last_id = rc.rfind("\n")
# temporary chunk
temp_chunk = rc[first_id:last_id]
# load data using loadtxt
data = np.loadtxt(StringIO(temp_chunk)
wave = Wave()
wave.lat = data.T[0]
wave.lon = data.T[1]
wave.altitude = data.T[2]
wave.pressure = data.T[3]
wave_list.append(wave)
return wave_list
然而,這兩種方法都相當緩慢。我看看熊貓文檔,但無法找到避免文件中間標題的方法。我也看看不同的問題的例子:
Splitting a file based on text in Python
How to split and parse a big text file in python in a memory-efficient way?
但它們都沒有解決我的問題。有沒有更快的方法來閱讀這種文本文件。先謝謝你。
你想拆就什麼數據? – 2014-10-08 20:25:54
@Padraic上面顯示的數據爲例。或者你是什麼意思?對不起,我不是很瞭解 – 2014-10-08 20:45:04
是的,你想分裂哪裏有文字? – 2014-10-08 20:45:51