2
這是指在使用SAS之前回答的問題。 SAS - transpose multiple variables in rows to columns使用熊貓根據羣組將多個變量按列轉置爲列
新的事情是,變量的長度不是兩個,而是各不相同。這裏有一個例子:
acct la ln seq1 seq2
0 9999 20.01 100 1 10
1 9999 19.05 1 1 10
2 9999 30.00 1 1 10
3 9999 26.77 100 2 11
4 9999 24.96 1 2 11
5 8888 38.43 218 3 20
6 8888 37.53 1 3 20
我所需的輸出是:
acct la ln seq1 seq2 la0 la1 la2 la3 ln0 ln1 ln2
5 8888 38.43 218 3 20 38.43 37.53 NaN NaN 218 1 NaN
0 9999 20.01 100 1 10 20.01 19.05 30 NaN 100 1 1
3 9999 26.77 100 2 11 26.77 24.96 NaN NaN 100 1 NaN
在SAS我可以用PROC總結,但是我想要得到它在Python這樣做,因爲我不能用這是相當簡單的SAS不再。
我已經解決了我可以重複使用的問題,但我想知道在熊貓中是否有更容易的選項,我沒有看到。這是我的解決方案。如果有人有更快的方法會很有趣!
# write multiple row to col based on groupby
import pandas as pd
from pandas import DataFrame
import numpy as np
data = DataFrame({
"acct": [9999, 9999, 9999, 9999, 9999, 8888, 8888],
"seq1": [1, 1, 1, 2, 2, 3, 3],
"seq2": [10, 10, 10, 11, 11, 20, 20],
"la": [20.01, 19.05, 30, 26.77, 24.96, 38.43, 37.53],
"ln": [100, 1, 1, 100, 1, 218, 1]
})
# group the variables by some classes
grouped = data.groupby(["acct", "seq1", "seq2"])
def rows_to_col(column, size):
# create head and contain to iterate through the groupby values
head = []
contain = []
for i,j in grouped:
head.append(i)
contain.append(j)
# transpose the values in contain
contain_transpose = []
for i in range(0,len(contain)):
contain_transpose.append(contain[i][column].tolist())
# determine the longest list of a sublist
length = len(max(contain_transpose, key = len))
# assign missing values to sublist smaller than longest list
for i in range(0, len(contain_transpose)):
if len(contain_transpose[i]) != length:
contain_transpose[i].append("NaN" * (length - len(contain_transpose[i])))
# create columns for the transposed column values
for i in range(0, len(contain)):
for j in range(0, size):
contain[i][column + str(j)] = np.nan
# assign the transposed values to the column
for i in range(0, len(contain)):
for j in range(0, length):
contain[i][column + str(j)] = contain_transpose[i][j]
# now always take the first values of the grouped group
concat_list = []
for i in range(0, len(contain)):
concat_list.append(contain[i][:1])
return pd.concat(concat_list) # concate the list
# fill in column name and expected size of the column
data_la = rows_to_col("la", 4)
data_ln = rows_to_col("ln", 3)
# merge the two data frames together
cols_use = data_ln.columns.difference(data_la.columns)
data_final = pd.merge(data_la, data_ln[cols_use], left_index=True, right_index=True, how="outer")
data_final.drop(["la", "ln"], axis = 1)
很酷這個要短得多! thx –
希望它也更快。 Python中的循環通常很慢。總是樂於幫助'SAS'er的同伴。 –