如果文本number
在所有文件一樣,和文字code
也一樣,你可以使用read_csv
,然後通過子組contains
和isnull
篩選列行, notnull
:
import pandas as pd
import numpy as np
import io
temp=u""" number amount
10 34
23 65
54 07
code volume dim
1 56 34
23 57 565
number amount
40 674
73 2365
code volume dim
341 3456 6534
23 0957 908565
number amount
210 4534
2343 4565"""
#after testing replace io.StringIO(temp) to filename
df = pd.read_csv(io.StringIO(temp), sep="\s+", index_col=None, skip_blank_lines=True, names=['a', 'b', 'c'])
print df
a b c
0 number amount NaN
1 10 34 NaN
2 23 65 NaN
3 54 07 NaN
4 code volume dim
5 1 56 34
6 23 57 565
7 number amount NaN
8 40 674 NaN
9 73 2365 NaN
10 code volume dim
11 341 3456 6534
12 23 0957 908565
13 number amount NaN
14 210 4534 NaN
15 2343 4565 NaN
df = df[~((df.a.str.contains('number')) | (df.a.str.contains('code')))]
df1 = df[df.c.isnull()]
df1 = df1[['a', 'b']].reset_index(drop=True)
df1.columns = ['number','amount']
#if all numbers are integers
df1 = df1.astype(int)
print df1
number amount
0 10 34
1 23 65
2 54 7
3 40 674
4 73 2365
5 210 4534
6 2343 4565
df2 = df[df.c.notnull()].reset_index(drop=True)
df2.columns = ['code','volume', 'dim']
#if all numbers are integers
df2 = df2.astype(int)
print df2
code volume dim
0 1 56 34
1 23 57 565
2 341 3456 6534
3 23 957 908565
所有文件中的'number'是否相同?所有數字都是整數嗎? – jezrael