它實際上是更快使用CSV lib和str.replace:
import csv
with open("test.txt") as f:
next(f)
# itertools.imap python2
df = pd.DataFrame.from_records(csv.reader(map(lambda x: x.rstrip().replace("|", ";"), f), delimiter=";"),
columns=["Column 1", "Column 2", "ID", "Age", "Height"]).astype(int)
一些計時:
In [35]: %%timeit
pd.read_csv("test.txt", sep="[;|]", engine='python', skiprows=1,
names=["Column 1", "Column 2", "ID", "Age", "Height"])
....:
100 loops, best of 3: 14.7 ms per loop
In [36]: %%timeit
with open("test.txt") as f:
next(f)
df = pd.DataFrame.from_records(csv.reader(map(lambda x: x.rstrip().replace("|", ";"), f),delimiter=";"),
columns=["Column 1", "Column 2", "ID", "Age", "Height"]).astype(int)
....:
100 loops, best of 3: 6.05 ms per loop
你可以str.split:
with open("test.txt") as f:
next(f)
df = pd.DataFrame.from_records(map(lambda x: x.rstrip().replace("|", ";").split(";"), f),
columns=["Column 1", "Column 2", "ID", "Age", "Height"])
可以解析最後一欄和[分割它(http://stackoverflow.com/questions/14745022/pandas-dataframe-how-do- –