2017-04-12 56 views
0

我正在嘗試創建一個將標籤值保存到2D DataFrame的Pandas DataFram。這是我迄今所做的:將2D Panda的DataFrame的列表轉換爲3D DataFrame

我讀書用pd.read_csv CSV文件,並追加其列出,對於這個問題的目的,讓我們來看看下面的代碼:

import numpy as np 
import pandas as pd 

raw_sample = [] 
labels = [1,1,1,2,2,2] 
samples = np.random.randn(6, 5, 4) 
for contents in range(samples.shape[0]): 
    raw_sample.append(pd.DataFrame(samples[contents])) 

然後,我添加raw_sampledf=d.DataFrame(raw_sample)。然後,我通過執行以下添加標籤df

df = df.set_index([df.index, labels]) 
df.index = df.index.set_names('index', level=0) 
df.index = df.index.set_names('labels', level=1) 

我試着打印該和我

               0 
index labels             
0  1     0   1   2   3 
0 0... 
1  1     0   1   2   3 
0 0... 
2  1     0   1   2   3 
0 1... 
3  2     0   1   2   3 
0 -0... 
4  2     0   1   2   3 
0 0... 
5  2     0   1   2   3 
0 -0... 

我也曾嘗試打印df[0],我仍然得到了同樣的事情。

我想知道這是否是在

我知道一個數據幀不能把二維數組的形式,其他的事情是使用pd.Panel,爲了這個,我轉換的raw_sample全部內容numpy的數組,然後轉換raw_sample本身numpy的陣列,也做了以下內容:

p1 = pd.Panel(samples, items=map(str, labels)) 

,但是當我打印,我得到

<class 'pandas.core.panel.Panel'> 
Dimensions: 6 (items) x 5 (major_axis) x 4 (minor_axis) 
Items axis: 1 to 2 
Major_axis axis: 0 to 4 
Minor_axis axis: 0 to 3 

看着Items,它看起來像所有的常見值被分組在一起。

我不知道該怎麼做。幫幫我!!

更新

輸入:

labels = [1,1,1,2,2,2] 
samples = [5x4 pd.DataFrame, 5x4 pd.DataFrame, 5x4 pd.DataFrame, 5x4 pd.DataFrame, 5x4 pd.DataFrame, 5x4 pd.DataFrame] 

所需的輸出:

index labels  samples 
    0  1  1 2 3 4 5 6 7 
       3 5 6 7 9 5 4 
       3 4 5 6 7 8 9 
    1  1  4 3 2 4 5 6 7 
       3 5 6 7 4 5 6 
       2 3 4 3 4 5 3 
... 
+0

不確定你確切需要什麼。你可以給我們你的輸入和期望的輸出嗎? – Allen

+0

@Allen更新。謝謝。 – Akshay

+0

我不確定,但似乎你需要獨特的'labels',所以將'labels = [1,1,1,2,2,2]'改爲'labels = list('abcdef')'然後可以選擇通過'print(p1 ['a'])' – jezrael

回答

1

如果選擇與不是唯一的項目,得到另一個Panel

np.random.seed(10) 
labels = [1,1,1,2,2,2] 
samples = np.random.randn(6, 5, 4) 
p1 = pd.Panel(samples, items=map(str, labels)) 
print (p1) 
<class 'pandas.core.panel.Panel'> 
Dimensions: 6 (items) x 5 (major_axis) x 4 (minor_axis) 
Items axis: 1 to 2 
Major_axis axis: 0 to 4 
Minor_axis axis: 0 to 3 

print (p1['1']) 
<class 'pandas.core.panel.Panel'> 
Dimensions: 3 (items) x 5 (major_axis) x 4 (minor_axis) 
Items axis: 1 to 1 
Major_axis axis: 0 to 4 
Minor_axis axis: 0 to 3 
print (p1.to_frame()) 
        1   1   1   2   2   2 
major minor                
0  0  1.331587 1.331587 1.331587 -0.232182 -0.232182 -0.232182 
     1  0.715279 0.715279 0.715279 -0.501729 -0.501729 -0.501729 
     2  -1.545400 -1.545400 -1.545400 1.128785 1.128785 1.128785 
     3  -0.008384 -0.008384 -0.008384 -0.697810 -0.697810 -0.697810 
1  0  0.621336 0.621336 0.621336 -0.081122 -0.081122 -0.081122 
     1  -0.720086 -0.720086 -0.720086 -0.529296 -0.529296 -0.529296 
     2  0.265512 0.265512 0.265512 1.046183 1.046183 1.046183 
     3  0.108549 0.108549 0.108549 -1.418556 -1.418556 -1.418556 
2  0  0.004291 0.004291 0.004291 -0.362499 -0.362499 -0.362499 
     1  -0.174600 -0.174600 -0.174600 -0.121906 -0.121906 -0.121906 
     2  0.433026 0.433026 0.433026 0.319356 0.319356 0.319356 
     3  1.203037 1.203037 1.203037 0.460903 0.460903 0.460903 
3  0  -0.965066 -0.965066 -0.965066 -0.215790 -0.215790 -0.215790 
     1  1.028274 1.028274 1.028274 0.989072 0.989072 0.989072 
     2  0.228630 0.228630 0.228630 0.314754 0.314754 0.314754 
     3  0.445138 0.445138 0.445138 2.467651 2.467651 2.467651 
4  0  -1.136602 -1.136602 -1.136602 -1.508321 -1.508321 -1.508321 
     1  0.135137 0.135137 0.135137 0.620601 0.620601 0.620601 
     2  1.484537 1.484537 1.484537 -1.045133 -1.045133 -1.045133 
     3  -1.079805 -1.079805 -1.079805 -0.798009 -0.798009 -0.798009 

但是,如果有獨特的一個,得到DataFrame

np.random.seed(10) 
labels = list('abcdef') 
samples = np.random.randn(6, 5, 4) 
p1 = pd.Panel(samples, items=labels) 
print (p1) 
<class 'pandas.core.panel.Panel'> 
Dimensions: 6 (items) x 5 (major_axis) x 4 (minor_axis) 
Items axis: a to f 
Major_axis axis: 0 to 4 
Minor_axis axis: 0 to 3 

print (p1['a']) 
      0   1   2   3 
0 1.331587 0.715279 -1.545400 -0.008384 
1 0.621336 -0.720086 0.265512 0.108549 
2 0.004291 -0.174600 0.433026 1.203037 
3 -0.965066 1.028274 0.228630 0.445138 
4 -1.136602 0.135137 1.484537 -1.079805 
print (p1.to_frame()) 
        a   b   c   d   e   f 
major minor                
0  0  1.331587 -1.977728 0.660232 -0.232182 1.985085 0.117476 
     1  0.715279 -1.743372 -0.350872 -0.501729 1.744814 -1.907457 
     2  -1.545400 0.266070 -0.939433 1.128785 -1.856185 -0.922909 
     3  -0.008384 2.384967 -0.489337 -0.697810 -0.222774 0.469751 
1  0  0.621336 1.123691 -0.804591 -0.081122 -0.065848 -0.144367 
     1  -0.720086 1.672622 -0.212698 -0.529296 -2.131712 -0.400138 
     2  0.265512 0.099149 -0.339140 1.046183 -0.048831 -0.295984 
     3  0.108549 1.397996 0.312170 -1.418556 0.393341 0.848209 
2  0  0.004291 -0.271248 0.565153 -0.362499 0.217265 0.706830 
     1  -0.174600 0.613204 -0.147420 -0.121906 -1.994394 -0.787269 
     2  0.433026 -0.267317 -0.025905 0.319356 1.107708 0.292941 
     3  1.203037 -0.549309 0.289094 0.460903 0.244544 -0.470807 
3  0  -0.965066 0.132708 -0.539879 -0.215790 -0.061912 2.404326 
     1  1.028274 -0.476142 0.708160 0.989072 -0.753893 -0.739357 
     2  0.228630 1.308473 0.842225 0.314754 0.711959 -0.312829 
     3  0.445138 0.195013 0.203581 2.467651 0.918269 -0.348882 
4  0  -1.136602 0.400210 2.394704 -1.508321 -0.482093 -0.439026 
     1  0.135137 -0.337632 0.917459 0.620601 0.089588 0.141104 
     2  1.484537 1.256472 -0.112272 -1.045133 0.826999 0.273049 
     3  -1.079805 -0.731970 -0.362180 -0.798009 -1.954512 -1.618571 

是相同DataFrame與非唯一列:

samples = np.random.randn(6, 5) 
df = pd.DataFrame(samples, columns=list('11122')) 
print (df) 
      1   1   1   2   2 
0 0.346338 -0.855797 -0.932463 -2.289259 0.634696 
1 0.272794 -0.924357 -1.898270 -0.743083 -1.587480 
2 -0.519975 -0.136836 0.530178 -0.730629 2.520821 
3 0.137530 -1.232763 0.508548 -0.480384 -1.213064 
4 -0.157787 -1.600004 -1.287620 0.384642 -0.568072 
5 -0.649427 -0.659585 -0.813359 -1.487412 -0.044206 

print (df['1']) 
      1   1   1 
0 0.346338 -0.855797 -0.932463 
1 0.272794 -0.924357 -1.898270 
2 -0.519975 -0.136836 0.530178 
3 0.137530 -1.232763 0.508548 
4 -0.157787 -1.600004 -1.287620 
5 -0.649427 -0.659585 -0.813359 

編輯:

還創建從列表df需要獨特labels(沒有唯一引發錯誤),並與參數keys功能concat,爲Panel呼叫to_panel

np.random.seed(100) 
raw_sample = [] 
labels = list('abcdef') 
samples = np.random.randn(6, 5, 4) 
for contents in range(samples.shape[0]): 
    raw_sample.append(pd.DataFrame(samples[contents])) 

df = pd.concat(raw_sample, keys=labels) 
print (df) 
      0   1   2   3 
a 0 -1.749765 0.342680 1.153036 -0.252436 
    1 0.981321 0.514219 0.221180 -1.070043 
    2 -0.189496 0.255001 -0.458027 0.435163 
    3 -0.583595 0.816847 0.672721 -0.104411 
    4 -0.531280 1.029733 -0.438136 -1.118318 
b 0 1.618982 1.541605 -0.251879 -0.842436 
    1 0.184519 0.937082 0.731000 1.361556 
    2 -0.326238 0.055676 0.222400 -1.443217 
    3 -0.756352 0.816454 0.750445 -0.455947 
    4 1.189622 -1.690617 -1.356399 -1.232435 
c 0 -0.544439 -0.668172 0.007315 -0.612939 
    1 1.299748 -1.733096 -0.983310 0.357508 
    2 -1.613579 1.470714 -1.188018 -0.549746 
    3 -0.940046 -0.827932 0.108863 0.507810 
    4 -0.862227 1.249470 -0.079611 -0.889731 
d 0 -0.881798 0.018639 0.237845 0.013549 
    1 -1.635529 -1.044210 0.613039 0.736205 
    2 1.026921 -1.432191 -1.841188 0.366093 
    3 -0.331777 -0.689218 2.034608 -0.550714 
    4 0.750453 -1.306992 0.580573 -1.104523 
e 0 0.690121 0.686890 -1.566688 0.904974 
    1 0.778822 0.428233 0.108872 0.028284 
    2 -0.578826 -1.199451 -1.705952 0.369164 
    3 1.876573 -0.376903 1.831936 0.003017 
    4 -0.076023 0.003958 -0.185014 -2.487152 
f 0 -1.704651 -1.136261 -2.973315 0.033317 
    1 -0.248889 -0.450176 0.132428 0.022214 
    2 0.317368 -0.752414 -1.296392 0.095139 
    3 -0.423715 -1.185984 -0.365462 -1.271023 
    4 1.586171 0.693391 -1.958081 -0.134801 

p1 = df.to_panel() 
print (p1) 
<class 'pandas.core.panel.Panel'> 
Dimensions: 4 (items) x 6 (major_axis) x 5 (minor_axis) 
Items axis: 0 to 3 
Major_axis axis: a to f 
Minor_axis axis: 0 to 4 

EDIT1 :

如果需要多指標DataFrame可能創造獨特的價值幫手範圍,使用concat和最後刪除幫手水平MultiIndex

np.random.seed(100) 
raw_sample = [] 
labels = [1,1,1,2,2,2] 
mux = pd.MultiIndex.from_arrays([labels, range(len(labels))]) 

samples = np.random.randn(6, 5, 4) 
for contents in range(samples.shape[0]): 
    raw_sample.append(pd.DataFrame(samples[contents])) 

df = pd.concat(raw_sample, keys=mux) 

df = df.reset_index(level=1, drop=True) 
print (df) 
      0   1   2   3 
1 0 -1.749765 0.342680 1.153036 -0.252436 
    1 0.981321 0.514219 0.221180 -1.070043 
    2 -0.189496 0.255001 -0.458027 0.435163 
    3 -0.583595 0.816847 0.672721 -0.104411 
    4 -0.531280 1.029733 -0.438136 -1.118318 
    0 1.618982 1.541605 -0.251879 -0.842436 
    1 0.184519 0.937082 0.731000 1.361556 
    2 -0.326238 0.055676 0.222400 -1.443217 
    3 -0.756352 0.816454 0.750445 -0.455947 
    4 1.189622 -1.690617 -1.356399 -1.232435 
    0 -0.544439 -0.668172 0.007315 -0.612939 
    1 1.299748 -1.733096 -0.983310 0.357508 
    2 -1.613579 1.470714 -1.188018 -0.549746 
    3 -0.940046 -0.827932 0.108863 0.507810 
    4 -0.862227 1.249470 -0.079611 -0.889731 
2 0 -0.881798 0.018639 0.237845 0.013549 
    1 -1.635529 -1.044210 0.613039 0.736205 
    2 1.026921 -1.432191 -1.841188 0.366093 
    3 -0.331777 -0.689218 2.034608 -0.550714 
    4 0.750453 -1.306992 0.580573 -1.104523 
    0 0.690121 0.686890 -1.566688 0.904974 
    1 0.778822 0.428233 0.108872 0.028284 
    2 -0.578826 -1.199451 -1.705952 0.369164 
    3 1.876573 -0.376903 1.831936 0.003017 
    4 -0.076023 0.003958 -0.185014 -2.487152 
    0 -1.704651 -1.136261 -2.973315 0.033317 
    1 -0.248889 -0.450176 0.132428 0.022214 
    2 0.317368 -0.752414 -1.296392 0.095139 
    3 -0.423715 -1.185984 -0.365462 -1.271023 
    4 1.586171 0.693391 -1.958081 -0.134801 

但創建面板是不可能的:

p1 = df.to_panel() 
print (p1) 

>ValueError: Can't convert non-uniquely indexed DataFrame to Panel 
+0

請檢查編輯是否從列表中創建'DataFrame'。 – jezrael

+0

問題是,標籤不能是唯一的,每個標籤都映射到一個樣本。它們就像機器學習的樣本。 – Akshay

+0

:(在熊貓中的副本是支持的,但sume函數不能像'reindex','concat'一樣工作。 – jezrael

相關問題