2017-04-12 31 views
1

我有一個dataframe,當它開始加載列表的列表如下:pandas:如何正確堆疊我的數據?

   0  1  2 3  4  5  6  7  8 \ 
0  Segment Nov-12 Dec-12  Jan-13 Feb-13 Mar-13 Apr-13 May-13 
1   A      N/A  N/A  N/A  N/A  N/A 
2   B      N/A  N/A  N/A  N/A  N/A 
3   C      N/A  N/A  N/A  N/A  N/A 
4   D      N/A  N/A  N/A  N/A  N/A 
5   Total     N/A  N/A  N/A  N/A  N/A 

每個月下的值將是浮點值。我想轉動的dataframe所以我最終的東西,如:

Segment Month Value 
0 A  month value 
1 A  month value 
2 B  month value 
3 B  month value 
etc... 

什麼是做到這一點的最好方法是什麼?

回答

2
v = df.values[1:, 1:].astype(float) 

mux = pd.MultiIndex.from_product(
    [df.iloc[1:, 0], df.iloc[0, 1:]], 
    names=['Segment', 'Month'] 
) 

d1 = pd.Series(v.ravel(), mux).reset_index(name='Value') 
print(d1) 

Segment Month Value 
0  A Nov-12 NaN 
1  A Dec-12 NaN 
2  A Jan-13 NaN 
3  A Feb-13 NaN 
4  A Mar-13 NaN 
5  A Apr-13 NaN 
6  A May-13 NaN 
7  B Nov-12 NaN 
8  B Dec-12 NaN 
9  B Jan-13 NaN 
10  B Feb-13 NaN 
11  B Mar-13 NaN 
12  B Apr-13 NaN 
13  B May-13 NaN 
14  C Nov-12 NaN 
15  C Dec-12 NaN 
16  C Jan-13 NaN 
17  C Feb-13 NaN 
18  C Mar-13 NaN 
19  C Apr-13 NaN 
20  C May-13 NaN 
21  D Nov-12 NaN 
22  D Dec-12 NaN 
23  D Jan-13 NaN 
24  D Feb-13 NaN 
25  D Mar-13 NaN 
26  D Apr-13 NaN 
27  D May-13 NaN 
28 Total Nov-12 NaN 
29 Total Dec-12 NaN 
30 Total Jan-13 NaN 
31 Total Feb-13 NaN 
32 Total Mar-13 NaN 
33 Total Apr-13 NaN 
34 Total May-13 NaN 

說明

# Your data obviously has an index in the first column 
# and column headers in the first row 
# I grab the underlyting `numpy` array 
# from the 2nd column and 2nd row onward 
# and convert to float 
v = df.values[1:, 1:].astype(float) 

# I'm going to create a `pd.MultiIndex` to enable me 
# to unstack the `pd.Series` I'll create 
# the first level of the index will be that first column 
# that was obviously the index 
# the second level will be the first row that was 
# obviously the column headers 
# the trick here is that I use `from_product` 
# which gives me every combination of those arrays 
# `ravel` unwinds or flattens the matrix and now 
# lines up with this `pd.MultiIndex` that has every combination 
# of row and column labels 
mux = pd.MultiIndex.from_product(
    [df.iloc[1:, 0], df.iloc[0, 1:]], 
    names=['Segment', 'Month'] 
) 

# I construct the `pd.Series` and `unstack` to make the matrix 
# `reset_index` takes those levels of the index and pushes them out 
# the the dataframe data part. `name='Value'` just makes sure the 
# values of the series get a column name 
d1 = pd.Series(v.ravel(), mux).reset_index(name='Value') 
print(d1) 
+0

您的解決方案完美地工作,我想知道,如果你能解釋一下怎麼行'D1 = pd.Series(V。 ravel(),mux).reset_index(name ='Value')'工作? '.reset_index'部分將我拋棄。 – flybonzai

+1

@flybonzai我更新了一些解釋 – piRSquared

0

我最終找到了解決方案,但請讓我知道我可以如何改進它。

 cac_df = pd.DataFrame(data=vals) 
     cac_df.rename(index=cac_df[0], inplace=True) 
     del cac_df[0] 
     cac_df = cac_df.rename(columns=cac_df.loc['Segment']).drop('Segment') 
     cac_df = cac_df.applymap(lambda x: None if not x or x == 'N/A' else x) 
     cac_df = pd.DataFrame(
      cac_df.dropna(axis=1, how='all').stack() 
     ) 

堆棧扔我一個循環,因爲它返回一個Series代替DataFrame,這是在文檔指出,如果你只有列層次的一個級別的。