2017-09-10 14 views
-3

我有幾個數組像下面這樣:重新編制的數據,從而丟失的數據點都充滿了NaN的

[[ 0.   1.   0.73475787 0.36224658 0.08579446 -0.11767365 
    -0.09927562 0.17444341 0.47212111 1.00584593 1.69147789 1.89421069 
    1.4718292 ] 
[ 2.   1.   0.68744907 0.38420843 0.25922927 0.04719614 
    0.00841919 0.21967246 0.22183329 0.28910002 0.54637077 -0.04389335 
    -1.33445338] 
[ 3.   1.   0.77854922 0.41093192 0.0713814 -0.08194854 
    -0.07885753 0.1491798 0.56297583 1.0759857 1.57149366 1.37958867 
    0.64409152] 
[ 5.   1.   0.09182989 0.14988215 -0.1272845 0.12154707 
    -0.01194815 -0.06136953 0.18783772 0.46631855 0.78850281 0.64755372 
    0.69757144]] 

請注意,數組[我,0]給我一個計數。在這個特定的數組中,1,4和6丟失。在其他情況下,我可能2,3,5或什麼不缺。

現在,對於我後來的薈萃分析,我希望數組中包含缺少計數的所有NaN。

在上面的例子,我想有

[[ 0.   1.   0.73475787 0.36224658 0.08579446 -0.11767365 
    -0.09927562 0.17444341 0.47212111 1.00584593 1.69147789 1.89421069 
    1.4718292 ] 
[[ 1.   NaN   NaN   NaN  NaN   NaN 
    NaN   NaN   NaN   NaN  NaN   NaN 
    NaN ] 
[ 2.   1.   0.68744907 0.38420843 0.25922927 0.04719614 
    0.00841919 0.21967246 0.22183329 0.28910002 0.54637077 -0.04389335 
    -1.33445338] 
[ 3.   1.   0.77854922 0.41093192 0.0713814 -0.08194854 
    -0.07885753 0.1491798 0.56297583 1.0759857 1.57149366 1.37958867 
    0.64409152] 
[[ 4.   NaN   NaN   NaN  NaN   NaN 
    NaN   NaN   NaN   NaN  NaN   NaN 
    NaN ] 
[ 5.   1.   0.09182989 0.14988215 -0.1272845 0.12154707 
    -0.01194815 -0.06136953 0.18783772 0.46631855 0.78850281 0.64755372 
    0.69757144]] 
[[ 6.   NaN   NaN   NaN  NaN   NaN 
    NaN   NaN   NaN   NaN  NaN   NaN 
    NaN ] 

重新梳理我的陣列已經試過如下:

influence_incl_missing = np.ones((len(vec_conc),len(results)+1)) 
for i, conc in enumerate(vec_conc): 
    if i == influence[i,0]: 
     influence_incl_missing[i,:] = influence[i,:] 
    else: 
     influence_incl_missing[i,1:] = np.full(len(results),np.nan) 
     influence_incl_missing[i,0] = i 

這給了我明顯的錯誤

IndexError: index 4 is out of bounds for axis 0 with size 4 

因爲len(影響力)< len(vec_conc)。

我如何在python中做到這一點?

非常感謝!

+1

你有熊貓嗎? –

+0

這樣的背景下,「藥物干擾研究的薈萃分析」能夠幫助我們回答「在python中對數據進行排序,使缺少的數據點充滿NaN」的問題? 請問你的問題更抽象。 – RedEyed

+0

不,不要熊貓。聽起來像它可能是值得的? –

回答

0

安裝熊貓:

pip install pandas 

加載數據到數據幀pandas和應用reindex操作 - 應該這樣做。

import pandas as pd 

df = pd.DataFrame(arr) # arr is your array 

arr = df.set_index(df.columns[0])\ 
     .reindex(range(len(vec_conc)))\ 
     .reset_index().values 

arr 
array([[ 0.  , 1.  , 0.73475787, 0.36224658, 0.08579446, 
     -0.11767365, -0.09927562, 0.17444341, 0.47212111, 1.00584593, 
     1.69147789, 1.89421069, 1.4718292 ], 
     [ 1.  ,   nan,   nan,   nan,   nan, 
       nan,   nan,   nan,   nan,   nan, 
       nan,   nan,   nan], 
     [ 2.  , 1.  , 0.68744907, 0.38420843, 0.25922927, 
     0.04719614, 0.00841919, 0.21967246, 0.22183329, 0.28910002, 
     0.54637077, -0.04389335, -1.33445338], 
     [ 3.  , 1.  , 0.77854922, 0.41093192, 0.0713814 , 
     -0.08194854, -0.07885753, 0.1491798 , 0.56297583, 1.0759857 , 
     1.57149366, 1.37958867, 0.64409152], 
     [ 4.  ,   nan,   nan,   nan,   nan, 
       nan,   nan,   nan,   nan,   nan, 
       nan,   nan,   nan], 
     [ 5.  , 1.  , 0.09182989, 0.14988215, -0.1272845 , 
     0.12154707, -0.01194815, -0.06136953, 0.18783772, 0.46631855, 
     0.78850281, 0.64755372, 0.69757144], 
     [ 6.  ,   nan,   nan,   nan,   nan, 
       nan,   nan,   nan,   nan,   nan, 
       nan,   nan,   nan]]) 
+0

謝謝,我會嘗試熊貓 –

+0

甜蜜,熊貓實際上安裝在anaconda內,並立即做好了工作! –

相關問題