可以使用numpy.tile
進行重複列名和numpy.ravel
爲扁平化值DataFrame
:
#random dataframe
np.random.seed(100)
df = pd.DataFrame(np.random.randint(10, size=(5,5)), columns=list('ABCDE'))
print (df)
A B C D E
0 8 8 3 7 7
1 0 4 2 5 2
2 2 2 1 0 8
3 4 0 9 6 2
4 4 1 5 3 4
df2 = pd.DataFrame({
"name": np.tile(df.columns, len(df.index)),
"value": df.values.ravel()})
print (df2)
name value
0 A 8
1 B 8
2 C 3
3 D 7
4 E 7
5 A 0
6 B 4
7 C 2
8 D 5
9 E 2
10 A 2
11 B 2
12 C 1
13 D 0
14 E 8
15 A 4
16 B 0
17 C 9
18 D 6
19 E 2
20 A 4
21 B 1
22 C 5
23 D 3
24 E 4
時序(len(df) = 1M
):
#random dataframe
np.random.seed(100)
N = 1000000
df = pd.DataFrame(np.random.randint(10, size=(N,5)), columns=list('abcde'))
print (df)
In [86]: %timeit (pd.DataFrame({"name": np.tile(df.columns, len(df.index)),"value": df.values.ravel()}))
10 loops, best of 3: 84.8 ms per loop
In [87]: %timeit (pd.DataFrame(np.column_stack((np.tile(df.columns, df.shape[0]), df.values.reshape(-1,1))), columns=['name', 'value']))
10 loops, best of 3: 196 ms per loop
In [88]: %timeit (df.stack().reset_index(level=0, drop=True).reset_index(name='value').rename(columns={'index':'name'}))
1 loop, best of 3: 344 ms per loop
如果需要輸出numpy array
添加numpy.column_stack
:
print (np.column_stack((np.tile(df.columns, len(df.index)), df.values.ravel())))
[['a' 8]
['b' 8]
['c' 3]
['d' 7]
['e' 7]
['a' 0]
['b' 4]
['c' 2]
['d' 5]
['e' 2]
['a' 2]
['b' 2]
['c' 1]
['d' 0]
['e' 8]
['a' 4]
['b' 0]
['c' 9]
['d' 6]
['e' 2]
['a' 4]
['b' 1]
['c' 5]
['d' 3]
['e' 4]]
代碼生成 「數據」 是不完整的 –