將2D numpy陣列轉換爲結構化數組

我試圖將二維數組轉換爲具有命名字段的結構化數組。我希望2D數組中的每一行都是結構化數組中的新記錄。不幸的是，我所嘗試過的任何事情都是按我期望的方式工作。將2D numpy陣列轉換爲結構化數組

我開始有：

>>> myarray = numpy.array([("Hello",2.5,3),("World",3.6,2)]) 
>>> print myarray 
[['Hello' '2.5' '3'] 
['World' '3.6' '2']]

我要轉換的東西，看起來像這樣：

>>> newarray = numpy.array([("Hello",2.5,3),("World",3.6,2)], dtype=[("Col1","S8"),("Col2","f8"),("Col3","i8")]) 
>>> print newarray 
[('Hello', 2.5, 3L) ('World', 3.6000000000000001, 2L)]

我已經試過：

>>> newarray = myarray.astype([("Col1","S8"),("Col2","f8"),("Col3","i8")]) 
>>> print newarray 
[[('Hello', 0.0, 0L) ('2.5', 0.0, 0L) ('3', 0.0, 0L)] 
[('World', 0.0, 0L) ('3.6', 0.0, 0L) ('2', 0.0, 0L)]] 

>>> newarray = numpy.array(myarray, dtype=[("Col1","S8"),("Col2","f8"),("Col3","i8")]) 
>>> print newarray 
[[('Hello', 0.0, 0L) ('2.5', 0.0, 0L) ('3', 0.0, 0L)] 
[('World', 0.0, 0L) ('3.6', 0.0, 0L) ('2', 0.0, 0L)]]

兩者的這些方法試圖將myarray中的每個條目轉換爲給定dtype的記錄，因此會插入額外的零。我無法弄清楚如何讓它將每一行轉換成一條記錄。

的另一種嘗試：

>>> newarray = myarray.copy() 
>>> newarray.dtype = [("Col1","S8"),("Col2","f8"),("Col3","i8")] 
>>> print newarray 
[[('Hello', 1.7219343871178711e-317, 51L)] 
[('World', 1.7543139673493688e-317, 50L)]]

這個時候不進行實際的轉換。內存中的現有數據僅被重新解釋爲新的數據類型。

我開始的數組是從文本文件讀入的。數據類型不會提前知道，所以我無法在創建時設置dtype。我需要一個高性能和優雅的解決方案，它可以很好地適用於一般情況，因爲我將爲許多應用程序進行多次這種類型的轉換。

謝謝！

來源

2010-09-01 Emma

如下您可以「創建陣列的（平面）列表中的記錄陣列」使用numpy.core.records.fromarrays：

>>> import numpy as np 
>>> myarray = np.array([("Hello",2.5,3),("World",3.6,2)]) 
>>> print myarray 
[['Hello' '2.5' '3'] 
['World' '3.6' '2']] 


>>> newrecarray = np.core.records.fromarrays(myarray.transpose(), 
              names='col1, col2, col3', 
              formats = 'S8, f8, i8') 

>>> print newrecarray 
[('Hello', 2.5, 3) ('World', 3.5999999046325684, 2)]

我試圖做同樣的事情。我發現當numpy從現有的二維數組中創建一個結構化數組（使用np.core.records.fromarrays）時，它將二維數組中的每一列（而不是每一行）視爲一條記錄。所以你必須轉置它。這種numpy的行爲看起來並不直觀，但也許有一個很好的理由。

來源

2011-03-05 14:08:47 Curious2learn

with'fromrecords'你可以避免''轉置（）' – 2014-04-01 21:10:16

好吧，我一直在爲此苦苦掙扎了一段時間，但是我找到了一種方法來做到這一點，不需要太多的努力。我很抱歉，如果這個代碼是「髒」 ......

讓我們先從一個二維數組：

mydata = numpy.array([['text1', 1, 'longertext1', 0.1111], 
        ['text2', 2, 'longertext2', 0.2222], 
        ['text3', 3, 'longertext3', 0.3333], 
        ['text4', 4, 'longertext4', 0.4444], 
        ['text5', 5, 'longertext5', 0.5555]])

因此，我們結束了一個二維數組有4列和5行：

mydata.shape 
Out[30]: (5L, 4L)

使用numpy.core.records。陣列 - 我們需要提供輸入參數作爲陣列的列表，以便：

tuple(mydata) 
Out[31]: 
(array(['text1', '1', 'longertext1', '0.1111'], 
     dtype='|S11'), 
array(['text2', '2', 'longertext2', '0.2222'], 
     dtype='|S11'), 
array(['text3', '3', 'longertext3', '0.3333'], 
     dtype='|S11'), 
array(['text4', '4', 'longertext4', '0.4444'], 
     dtype='|S11'), 
array(['text5', '5', 'longertext5', '0.5555'], 
     dtype='|S11'))

這每個數據的行產生單獨的陣列，但，我們所需要的輸入數組是通過柱所以我們需要的是：

tuple(mydata.transpose()) 
Out[32]: 
(array(['text1', 'text2', 'text3', 'text4', 'text5'], 
     dtype='|S11'), 
array(['1', '2', '3', '4', '5'], 
     dtype='|S11'), 
array(['longertext1', 'longertext2', 'longertext3', 'longertext4', 
     'longertext5'], 
     dtype='|S11'), 
array(['0.1111', '0.2222', '0.3333', '0.4444', '0.5555'], 
     dtype='|S11'))

最後它需要陣列的列表，而不是一個元組，所以我們總結上面的列表中（）如下：

list(tuple(mydata.transpose()))

這就是我們的數據輸入參數排序...接下來是dtype：

mydtype = numpy.dtype([('My short text Column', 'S5'), 
         ('My integer Column', numpy.int16), 
         ('My long text Column', 'S11'), 
         ('My float Column', numpy.float32)]) 
mydtype 
Out[37]: dtype([('My short text Column', '|S5'), ('My integer Column', '<i2'), ('My long text Column', '|S11'), ('My float Column', '<f4')])

好了，現在我們可以傳遞到numpy.core.records.array（）：

myRecord = numpy.core.records.array(list(tuple(mydata.transpose())), dtype=mydtype)

...和手指交叉：

myRecord 
Out[36]: 
rec.array([('text1', 1, 'longertext1', 0.11110000312328339), 
     ('text2', 2, 'longertext2', 0.22220000624656677), 
     ('text3', 3, 'longertext3', 0.33329999446868896), 
     ('text4', 4, 'longertext4', 0.44440001249313354), 
     ('text5', 5, 'longertext5', 0.5554999709129333)], 
     dtype=[('My short text Column', '|S5'), ('My integer Column', '<i2'), ('My long text Column', '|S11'), ('My float Column', '<f4')])

瞧！您可以按列名索引中：

myRecord['My float Column'] 
Out[39]: array([ 0.1111 , 0.22220001, 0.33329999, 0.44440001, 0.55549997], dtype=float32)

我希望這有助於爲我浪費了太多的時間與numpy.asarray和mydata.astype等試圖讓這個最終制定出此方法之前工作。

來源

2013-03-01 13:50:21

我猜

new_array = np.core.records.fromrecords([("Hello",2.5,3),("World",3.6,2)], 
             names='Col1,Col2,Col3', 
             formats='S8,f8,i8')

是你想要的。

來源

2014-04-01 21:09:42

如果數據開始作爲一個元組列表，然後創建一個結構數組是直截了當：

In [228]: alist = [("Hello",2.5,3),("World",3.6,2)] 
In [229]: dt = [("Col1","S8"),("Col2","f8"),("Col3","i8")] 
In [230]: np.array(alist, dtype=dt) 
Out[230]: 
array([(b'Hello', 2.5, 3), (b'World', 3.6, 2)], 
     dtype=[('Col1', 'S8'), ('Col2', '<f8'), ('Col3', '<i8')])

併發症這裏是一個元組的列表已經變成一個二維字符串數組：

In [231]: arr = np.array(alist) 
In [232]: arr 
Out[232]: 
array([['Hello', '2.5', '3'], 
     ['World', '3.6', '2']], 
     dtype='<U5')

我們可以使用衆所周知的zip*辦法「換位」這陣 - 其實，我們希望有一個雙轉：

In [234]: list(zip(*arr.T)) 
Out[234]: [('Hello', '2.5', '3'), ('World', '3.6', '2')]

zip已經方便地給了我們一個元組列表。現在我們可以期望的D型細胞重新排列：

In [235]: np.array(_, dtype=dt) 
Out[235]: 
array([(b'Hello', 2.5, 3), (b'World', 3.6, 2)], 
     dtype=[('Col1', 'S8'), ('Col2', '<f8'), ('Col3', '<i8')])

接受的答案使用fromarrays：

In [236]: np.rec.fromarrays(arr.T, dtype=dt) 
Out[236]: 
rec.array([(b'Hello', 2.5, 3), (b'World', 3.6, 2)], 
      dtype=[('Col1', 'S8'), ('Col2', '<f8'), ('Col3', '<i8')])

內部，fromarrays需要一個共同的recfunctions方法：創建目標磁盤陣列，通過字段名的值複製。實際上它有：

In [237]: newarr = np.empty(arr.shape[0], dtype=dt) 
In [238]: for n, v in zip(newarr.dtype.names, arr.T): 
    ...:  newarr[n] = v 
    ...:  
In [239]: newarr 
Out[239]: 
array([(b'Hello', 2.5, 3), (b'World', 3.6, 2)], 
     dtype=[('Col1', 'S8'), ('Col2', '<f8'), ('Col3', '<i8')])

來源

2017-06-02 21:56:58 hpaulj

將2D numpy陣列轉換爲結構化數組

回答

相關問題