2015-01-11 73 views
5

我有一個任意地深深嵌套列表,包含元素的不同長度轉換和墊列表以numpy的陣列

my_list = [[[1,2],[4]],[[4,4,3]],[[1,2,1],[4,3,4,5],[4,1]]] 

欲將此轉換爲有效的數值(不是對象)numpy的陣列,通過填充了每個軸與NaN。所以結果應該看起來像

padded_list = np.array([[[ 1, 2, nan, nan], 
         [ 4, nan, nan, nan], 
         [nan, nan, nan, nan]], 
         [[ 4, 4, 3, nan], 
         [nan, nan, nan, nan], 
         [nan, nan, nan, nan]], 
         [[ 1, 2, 1, nan], 
         [ 4, 3, 4, 5], 
         [ 4, 1, nan, nan]]]) 

我該怎麼做?

回答

0

首先,計算列的長度和行:

len1 = max((len(el) for el in my_list)) 
len2 = max(len(el) for el in list(chain(*my_list))) 

其次,追加遺漏的NaN:

for el1 in my_list: 
    el1.extend([[]]*(len1-len(el1))) 
    for el2 in el1: 
     el2.extend([numpy.nan] * (len2-len(el2))) 
+1

這是好的,但應該與任意深度嵌套列表,如numpy.array – siamii

5

這適用於您的樣品,不知道它可以處理所有的極端情況正常:

from itertools import izip_longest 

def find_shape(seq): 
    try: 
     len_ = len(seq) 
    except TypeError: 
     return() 
    shapes = [find_shape(subseq) for subseq in seq] 
    return (len_,) + tuple(max(sizes) for sizes in izip_longest(*shapes, 
                   fillvalue=1)) 

def fill_array(arr, seq): 
    if arr.ndim == 1: 
     try: 
      len_ = len(seq) 
     except TypeError: 
      len_ = 0 
     arr[:len_] = seq 
     arr[len_:] = np.nan 
    else: 
     for subarr, subseq in izip_longest(arr, seq, fillvalue=()): 
      fill_array(subarr, subseq) 

現在:

>>> arr = np.empty(find_shape(my_list)) 
>>> fill_array(arr, my_list) 
>>> arr 
array([[[ 1., 2., nan, nan], 
     [ 4., nan, nan, nan], 
     [ nan, nan, nan, nan]], 

     [[ 4., 4., 3., nan], 
     [ nan, nan, nan, nan], 
     [ nan, nan, nan, nan]], 

     [[ 1., 2., 1., nan], 
     [ 4., 3., 4., 5.], 
     [ 4., 1., nan, nan]]]) 

我認爲這大致是numpy的形狀發現例程。由於無論如何都涉及很多Python函數調用,所以它可能不會與C實現相比較。