構建一個從每個索引值的多個附加

我有一些數據（42特徵）在幾個月內收集的人（最大-6; 因不同條目而異），每個月的價值是自己的行：構建一個從每個索引值的多個附加

有9267倍唯一ID的值（設定爲索引）和多達50 000行的DF。我想將其轉換爲42個* 6的特徵向量每個ID（即使有些會有很多的NaN存在的），這樣我可以在訓練他們，這裏是應該的樣子：

這裏是我的解決方案：

def flatten_features(f_matrix, ID): 
    '''constructs a 1x(6*n) vector from 6xn matrix''' 
    #check wether it is a series, not dataframe 
    if(len(f_matrix.shape) == 1): 
     f_matrix['ID'] = ID 
     return f_matrix 

    flattened_vector = f_matrix.iloc[0] 

    for i in range(1, f_matrix.shape[0]): 
     vector_append = f_matrix.iloc[i] 
     vector_append.index = (lambda month, series_names : series_names.map(lambda name : name + '_' + str(month)))\ 
           (i, vector_append.index) 
     flattened_vector = flattened_vector.append(vector_append) 

    flattened_vector['ID'] = ID 
    return flattened_vector 


#construct dataframe of flattened vectors for numerical features 
new_indices = flatten_features(numerical_f.iloc[:6], 1).index 
new_indices 

flattened_num_f = pd.DataFrame(columns=new_indices) 
flattened_num_f 

for label in numerical_f.index.unique(): 

    matr = numerical_f.loc[label] 
    flattened_num_f = flattened_num_f.append(flatten_features(matr, label))

它產生所需的結果，但它的運行速度非常慢。我想知道，是否有更優雅和快速的解決方案？

來源

2017-10-10 TheSmokingGnu

這完全是我不清楚你的期望的輸出是什麼ID。你能舉出你的輸入例子嗎？**不是圖像**和期望的輸出？ –

@ juanpa.arrivillaga我應該如何顯示我輸入的巨大df，如果不是通過jupyter筆記本表示的方式？ – TheSmokingGnu

根據需要提供[mcve]。一個圖像是無用的。看看[這個問題]（https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples）關於如何創建一個好的，可重複的'pandas'例子。 –

如果你想轉置df，你可以凸輪T功能。我假設你已經存儲在UNIQUE_ID可變

new_f = numerical_f.T 
new_f.columns = unique_id

來源

2017-10-10 18:09:39 galaxyan

但是轉置後的矩陣會有〜50 000列，而你的第二行只是用〜9000行替換它們，這會導致錯誤 – TheSmokingGnu

@ TheSmokingGnu你想把所有相同的id聚合在一起嗎？ – galaxyan

是的，每個唯一ID有1 42 * 6的二維行，包含這個ID的最多六行的值（全部存在） – TheSmokingGnu

構建一個從每個索引值的多個附加

回答

相關問題