2016-12-03 72 views
1

我有一系列例如產生這樣numpy的數組:級聯numpy的陣列,以兩個陣列

import random 
N = 5 
data = [[random.random() for i in range(N)] for j in range(N)] 
names = ['a','b','c','d','e'] 
df = pd.DataFrame(data) 
df = df.transpose() 
df.columns = names 

即:

a b c d e 
0.01 0.03 0.01 0.2 0.04 
0.2 0.01 0.02 0.01 0.1 
... 

,我想格式化它,使它看起來像這樣:

name value 
a  0.01 
b  0.03 
c  0.01 
d  0.2 
e  0.04 
a  0.2 
b  0.01 
.... 

(數據的順序並不重要)

我試圖大熊貓據幀轉:

df = pd.DataFrame(data) 
df = df.transpose() 
df.columns = names 

但結果是這樣的:

a 0.1 0.2 0.01 0.2 
b 0.3 0.1 0.2 0.01 
.... 

如何格式化numpy的陣列/大熊貓據幀有兩列數據的任何想法?

+1

代碼生成 「數據」 是不完整的 –

回答

2

可以使用numpy.tile進行重複列名和numpy.ravel爲扁平化值DataFrame

#random dataframe 
np.random.seed(100) 
df = pd.DataFrame(np.random.randint(10, size=(5,5)), columns=list('ABCDE')) 
print (df) 
    A B C D E 
0 8 8 3 7 7 
1 0 4 2 5 2 
2 2 2 1 0 8 
3 4 0 9 6 2 
4 4 1 5 3 4 
df2 = pd.DataFrame({ 
     "name": np.tile(df.columns, len(df.index)), 
     "value": df.values.ravel()}) 
print (df2)   
    name value 
0  A  8 
1  B  8 
2  C  3 
3  D  7 
4  E  7 
5  A  0 
6  B  4 
7  C  2 
8  D  5 
9  E  2 
10 A  2 
11 B  2 
12 C  1 
13 D  0 
14 E  8 
15 A  4 
16 B  0 
17 C  9 
18 D  6 
19 E  2 
20 A  4 
21 B  1 
22 C  5 
23 D  3 
24 E  4 

時序len(df) = 1M):

#random dataframe 
np.random.seed(100) 
N = 1000000 
df = pd.DataFrame(np.random.randint(10, size=(N,5)), columns=list('abcde')) 
print (df) 

In [86]: %timeit (pd.DataFrame({"name": np.tile(df.columns, len(df.index)),"value": df.values.ravel()})) 
10 loops, best of 3: 84.8 ms per loop 

In [87]: %timeit (pd.DataFrame(np.column_stack((np.tile(df.columns, df.shape[0]), df.values.reshape(-1,1))), columns=['name', 'value'])) 
10 loops, best of 3: 196 ms per loop 

In [88]: %timeit (df.stack().reset_index(level=0, drop=True).reset_index(name='value').rename(columns={'index':'name'})) 
1 loop, best of 3: 344 ms per loop 

如果需要輸出numpy array添加numpy.column_stack

print (np.column_stack((np.tile(df.columns, len(df.index)), df.values.ravel()))) 
[['a' 8] 
['b' 8] 
['c' 3] 
['d' 7] 
['e' 7] 
['a' 0] 
['b' 4] 
['c' 2] 
['d' 5] 
['e' 2] 
['a' 2] 
['b' 2] 
['c' 1] 
['d' 0] 
['e' 8] 
['a' 4] 
['b' 0] 
['c' 9] 
['d' 6] 
['e' 2] 
['a' 4] 
['b' 1] 
['c' 5] 
['d' 3] 
['e' 4]] 
+0

很好的解決方案!也很好地擴展。但請注意,'np.column_stack'不保存dtypes。 –

+1

@NickilMaveli - 謝謝。 – jezrael

1

是你想要的嗎?

In [11]: df 
Out[11]: 
      a   b   c   d   e 
0 0.791796 0.428642 0.887860 0.803709 0.860545 
1 0.230401 0.105232 0.617007 0.557678 0.590459 
2 0.448462 0.314422 0.207188 0.785642 0.022271 
3 0.075631 0.707029 0.111538 0.769387 0.174297 
4 0.707566 0.299966 0.197642 0.145841 0.231135 

In [12]: df.stack().reset_index(level=0, drop=True).reset_index() 
Out[12]: 
    index   0 
0  a 0.791796 
1  b 0.428642 
2  c 0.887860 
3  d 0.803709 
4  e 0.860545 
5  a 0.230401 
6  b 0.105232 
7  c 0.617007 
8  d 0.557678 
9  e 0.590459 
10  a 0.448462 
11  b 0.314422 
12  c 0.207188 
13  d 0.785642 
14  e 0.022271 
15  a 0.075631 
16  b 0.707029 
17  c 0.111538 
18  d 0.769387 
19  e 0.174297 
20  a 0.707566 
21  b 0.299966 
22  c 0.197642 
23  d 0.145841 
24  e 0.231135 
1

你只需要在concat一起df所有列。由於列的名稱不同,因此需要使用相同的名稱進行設置。如果不是,pandas將在concat結果中添加新列。

import random 
import pandas as pd 

N = 5 
data = [[random.random() for i in range(N)] for j in range(N)] 
names = ['a','b','c','d','e'] 

df = pd.DataFrame(data) 
df.columns = names 
df = df.transpose() 
print df 

#   0   1   2   3   4 
# a 0.643042 0.061476 0.415979 0.209272 0.394414 
# b 0.175363 0.580336 0.056173 0.468121 0.388956 
# c 0.096257 0.570860 0.516667 0.892087 0.956790 
# d 0.082906 0.340805 0.466074 0.01.293006 
# e 0.430240 0.759413 0.083779 0.442159 0.434603 

df_col=[df[[i]] for i in range(len(df))] # separate columns in df 
for col in df_col: 
    col.columns=['value']     # change the columns' name 

res = pd.concat(df_col)      # concat them all together 
res.index.names=['name'] 

print res 

#   value 
# name   
# a  0.643042 
# b  0.175363 
# c  0.096257 
# d  0.082906 
# e  0.430240 
# a  0.061476 
# b  0.580336 
# c  0.570860 
# d  0.340805 
# e  0.759413 
# a  0.415979 
# b  0.056173 
# c  0.516667 
# d  0.466074 
# e  0.083779 
# a  0.209272 
# b  0.468121 
# c  0.892087 
# d  0.01
# e  0.442159 
# a  0.394414 
# b  0.388956 
# c  0.956790 
# d  0.293006 
# e  0.434603