自動字符串長度

如果我以這種方式創建一個recarray：自動字符串長度

In [29]: np.rec.fromrecords([(1,'hello'),(2,'world')],names=['a','b'])

結果看起來不錯：

Out[29]: 
rec.array([(1, 'hello'), (2, 'world')], 
     dtype=[('a', '<i8'), ('b', '|S5')])

但是，如果我想指定的數據類型：

In [32]: np.rec.fromrecords([(1,'hello'),(2,'world')],dtype=[('a',np.int8),('b',np.str)])

該字符串設置爲零的長度：

Out[32]: 
rec.array([(1, ''), (2, '')], 
     dtype=[('a', '|i1'), ('b', '|S0')])

我需要爲所有的數值類型指定數據類型，因爲我關心的是int8/16/32等，但是我希望能從自動字符串長度檢測中受益，如果我不指定數據類型。我試圖用None替換np.str，但沒有運氣。我知道我可以指定'| S5'，但我不知道應該設置字符串長度。

來源

2009-11-03 astrofrog

我不知道該怎麼問numpy的，以確定您一個D型的某些方面，但不是別人，但不能你有，如：

data = [(1,'hello'),(2,'world')] 
dlen = max(len(s) for i, s in data) 
st = '|S%d' % dlen 
np.rec.fromrecords(data, dtype=[('a',np.int8), ('b',st)])

來源

2009-11-03 02:52:29

由於我與轉換任意工作這些元組列表需要重新編碼，這不是一個理想的解決方案（因爲我不知道哪些列將會是字符串）。當然，我可以手動搜索字符串長度，但我希望能夠避免這種情況。 – astrofrog 2009-11-03 04:25:06

如果你事先不知道哪些列是字符串，你怎麼知道哪些是int8 vs int16 vs int32，因爲你確實說你需要控制_that_「手動」？！要點是，你可以做你自己發現的類型和大小，或者讓numpy做它，或者讓它做到（部分或全部數據），然後通過用不同的dtype重新分析數據來否決它的觀點 - - 我不確定你還有什麼更進一步的選擇（就像你說你想控制某些列的類型而不是其他選項一樣，但是你不知道哪些是預先設定的？！） – 2009-11-03 04:33:19

對不起，你是正確的 - 我的意思是你提出的解決方案太簡單了，因爲它只能用於第二個字符串列的兩元素元組，但我當然可以編寫一個循環遍歷列和我知道包含的元素字符串，找到最大長度。我只是希望能夠避免重複可能已經在Numpy中的代碼。 – astrofrog 2009-11-03 04:39:44

如果您不需要操縱字符串作爲字節，您可以使用對象數據類型來表示它們。這本質上存儲一個指針，而不是實際的字節：

In [38]: np.array(data, dtype=[('a', np.uint8), ('b', np.object)]) 
Out[38]: 
array([(1, 'hello'), (2, 'world')], 
     dtype=[('a', '|u1'), ('b', '|O8')])

另外，Alex的想法將工作做好：

new_dt = [] 

# For each field of a given type and alignment, determine 
# whether the field is an integer. If so, represent it as a byte. 

for f, (T, align) in dt.fields.iteritems(): 
    if np.issubdtype(T, int): 
     new_dt.append((f, np.uint8)) 
    else: 
     new_dt.append((f, T)) 

new_dt = np.dtype(new_dt) 
np.array(data, dtype=new_dt)

應該產生

array([(1, 'hello'), (2, 'world')], 
     dtype=[('f0', '|u1'), ('f1', '|S5')])

來源

2009-11-19 22:45:10

自動字符串長度

回答

相關問題