如果性能是關鍵......用numpy
from numpy.core.defchararray import add as cadd
from functools import reduce
def proc(d1):
v = d1.values
n, m = v.shape
dates = np.repeat(d1.index.values.astype(str), m)
cols = np.tile(d1.columns.values.astype(str), n)
vals = v.ravel().astype(str)
return pd.Series(reduce(cadd, [dates, '-', cols, '-', vals]))
proc(df.set_index('Date'))
0 t1-A-1
1 t1-B-2
2 t2-A-3
3 t2-B-4
dtype: object
定時
%timeit proc(df.set_index('Date'))
%timeit df.set_index('Date').stack().reset_index().apply(lambda x: '-'.join(x.astype(str)), axis=1)
個
小數據
1000 loops, best of 3: 494 µs per loop
100 loops, best of 3: 2.17 ms per loop
大數據
from string import ascii_letters
np.random.seed([3,1415])
df = pd.DataFrame(
np.random.randint(10, size=(1000, 52)),
pd.Index(['t{:05d}'.format(i) for i in range(1000)], name='Date'),
list(ascii_letters)
).reset_index()
10 loops, best of 3: 156 ms per loop
1 loop, best of 3: 3.75 s per loop
恰好B5和C3是什麼? – Allen