pandas：如何正確堆疊我的數據？

我有一個dataframe，當它開始加載列表的列表如下：pandas：如何正確堆疊我的數據？

   0  1  2 3  4  5  6  7  8 \ 
0  Segment Nov-12 Dec-12  Jan-13 Feb-13 Mar-13 Apr-13 May-13 
1   A      N/A  N/A  N/A  N/A  N/A 
2   B      N/A  N/A  N/A  N/A  N/A 
3   C      N/A  N/A  N/A  N/A  N/A 
4   D      N/A  N/A  N/A  N/A  N/A 
5   Total     N/A  N/A  N/A  N/A  N/A

每個月下的值將是浮點值。我想轉動的dataframe所以我最終的東西，如：

Segment Month Value 
0 A  month value 
1 A  month value 
2 B  month value 
3 B  month value 
etc...

什麼是做到這一點的最好方法是什麼？

來源

2017-04-12 flybonzai

v = df.values[1:, 1:].astype(float) 

mux = pd.MultiIndex.from_product(
    [df.iloc[1:, 0], df.iloc[0, 1:]], 
    names=['Segment', 'Month'] 
) 

d1 = pd.Series(v.ravel(), mux).reset_index(name='Value') 
print(d1)

Segment Month Value 
0  A Nov-12 NaN 
1  A Dec-12 NaN 
2  A Jan-13 NaN 
3  A Feb-13 NaN 
4  A Mar-13 NaN 
5  A Apr-13 NaN 
6  A May-13 NaN 
7  B Nov-12 NaN 
8  B Dec-12 NaN 
9  B Jan-13 NaN 
10  B Feb-13 NaN 
11  B Mar-13 NaN 
12  B Apr-13 NaN 
13  B May-13 NaN 
14  C Nov-12 NaN 
15  C Dec-12 NaN 
16  C Jan-13 NaN 
17  C Feb-13 NaN 
18  C Mar-13 NaN 
19  C Apr-13 NaN 
20  C May-13 NaN 
21  D Nov-12 NaN 
22  D Dec-12 NaN 
23  D Jan-13 NaN 
24  D Feb-13 NaN 
25  D Mar-13 NaN 
26  D Apr-13 NaN 
27  D May-13 NaN 
28 Total Nov-12 NaN 
29 Total Dec-12 NaN 
30 Total Jan-13 NaN 
31 Total Feb-13 NaN 
32 Total Mar-13 NaN 
33 Total Apr-13 NaN 
34 Total May-13 NaN

說明

# Your data obviously has an index in the first column 
# and column headers in the first row 
# I grab the underlyting `numpy` array 
# from the 2nd column and 2nd row onward 
# and convert to float 
v = df.values[1:, 1:].astype(float) 

# I'm going to create a `pd.MultiIndex` to enable me 
# to unstack the `pd.Series` I'll create 
# the first level of the index will be that first column 
# that was obviously the index 
# the second level will be the first row that was 
# obviously the column headers 
# the trick here is that I use `from_product` 
# which gives me every combination of those arrays 
# `ravel` unwinds or flattens the matrix and now 
# lines up with this `pd.MultiIndex` that has every combination 
# of row and column labels 
mux = pd.MultiIndex.from_product(
    [df.iloc[1:, 0], df.iloc[0, 1:]], 
    names=['Segment', 'Month'] 
) 

# I construct the `pd.Series` and `unstack` to make the matrix 
# `reset_index` takes those levels of the index and pushes them out 
# the the dataframe data part. `name='Value'` just makes sure the 
# values of the series get a column name 
d1 = pd.Series(v.ravel(), mux).reset_index(name='Value') 
print(d1)

來源

2017-04-12 22:54:27 piRSquared

您的解決方案完美地工作，我想知道，如果你能解釋一下怎麼行'D1 = pd.Series（V。 ravel（），mux）.reset_index（name ='Value'）'工作？ '.reset_index'部分將我拋棄。 – flybonzai

@flybonzai我更新了一些解釋 – piRSquared

我最終找到了解決方案，但請讓我知道我可以如何改進它。

 cac_df = pd.DataFrame(data=vals) 
     cac_df.rename(index=cac_df[0], inplace=True) 
     del cac_df[0] 
     cac_df = cac_df.rename(columns=cac_df.loc['Segment']).drop('Segment') 
     cac_df = cac_df.applymap(lambda x: None if not x or x == 'N/A' else x) 
     cac_df = pd.DataFrame(
      cac_df.dropna(axis=1, how='all').stack() 
     )

堆棧扔我一個循環，因爲它返回一個Series代替DataFrame，這是在文檔指出，如果你只有列層次的一個級別的。

來源

2017-04-12 22:52:48 flybonzai

pandas：如何正確堆疊我的數據？

回答

相關問題