用戶消失,如果你需要處理與現在的固定格式,你可以使用類似如下:
def fixed_width_to_items(filename, fields, first_column_is_index=False, ignore_first_rows=0):
reader = open(filename, 'r')
# skip first rows
for i in xrange(ignore_first_rows):
reader.next()
if first_column_is_index:
index = slice(0, fields[1])
fields = [slice(*x) for x in zip(fields[1:-1], fields[2:])]
return ((line[index], [line[x].strip() for x in fields]) for line in reader)
else:
fields = [slice(*x) for x in zip(fields[:-1], fields[1:])]
return ((i, [line[x].strip() for x in fields]) for i,line in enumerate(reader))
這是在EST程序:
import pandas
import numpy
import tempfile
# create a data frame
df = pandas.DataFrame(numpy.random.randn(100, 5))
file_ = tempfile.NamedTemporaryFile(delete=True)
file_.write(df.to_string())
file_.flush()
# specify fields
fields = [0, 3, 12, 22, 32, 42, 52]
df2 = pandas.DataFrame.from_items(fixed_width_to_items(file_.name, fields, first_column_is_index=True, ignore_first_rows=1)).T
# need to specify the datatypes, otherwise everything is a string
df2 = pandas.DataFrame(df2, dtype=float)
df2.index = [int(x) for x in df2.index]
# check
assert (df - df2).abs().max().max() < 1E-6
這應該做的,如果你需要它現在的伎倆,但請記住,上面的功能很簡單,尤其是它沒有做有關數據類型的東西。
,大熊貓可以吃你的數據? – hochl 2012-03-15 14:12:59
你能顯示文件的前幾行嗎? – 2012-03-15 14:41:02