2013-06-21 50 views
2

好的,我對這個主題做了相當多的研究,並且我知道NumPy只支持同質矩陣。如何讓NumPy使用字符串和浮點數創建矩陣

我正在Python中使用NLTK包來處理一些語料庫語言學數據,並且只是想用不同的字符串作爲'列名'和實際數據值(浮點數)作爲其餘部分的矩陣的矩陣。

到目前爲止,我製作了兩個矩陣,一個是字符串,一個是浮點數,然後用vstack把它們放在一起。直到我嘗試使用NumPy的savetxt()方法和堆疊矩陣的這個新「矩陣」,但它不會寫入.csv文件,因爲矩陣不是「類矩陣」,因爲它不是同質的。 FML。

我真的希望能夠使用NumPy處理實際數據點的所有真棒方法,但是我無法得到一個令人討厭的'數組字符串來放在矩陣的頂部變成一個.csv。有任何想法嗎?我真的很喜歡不必再次通過將Python的list-of-list方法應用於多維數組來嘗試這一切。

下面是代碼:

import os.path 
import sys 
import nltk 
from numpy import * 
from nltk.corpus.reader import CHILDESCorpusReader 
from nltk.probability import ConditionalFreqDist, FreqDist 

n_rows = 12 
n_cols = 19 
init_row = 0 
init_col = 0 
neg_words = ["Age", "MLU", "All Tokens","no","not","don't","can't","won't","isn't","wasn't","wouldn't","shouldn't","couldn't","didn't","haven't","aren't","haven't","hasn't","doesn't"] 

Matrix_headers = array(range(len(neg_words)), dtype='a12') 
Matrix_values = zeros(n_rows*n_cols).reshape((n_rows, n_cols)) #the matrix with the data points (floats) 

for entry in range(len(neg_words)): 
    Matrix_headers[entry] = neg_words[entry] 

p = neg_words 
q = Matrix_values 
Matrix = vstack([p,q]) 


out_name = "/Users/nicholasmoores/Documents/Research/neg_table.csv" 
savetxt(out_name, Matrix, fmt='%.3e',delimiter = "\t") 

raw_input("\n\nPress the enter key to exit.") 
+3

怎麼樣'pandas'」'DataFrame'? –

+0

是的,你應該使用熊貓這個 –

+0

我只是最終能夠下載和安裝熊貓,所以我會嘗試熊貓DataFrame。我的觀點是,我不想將這個輸入到R中的DataFrame中,所以我對Pandas存在感到非常興奮 –

回答

2

你可以使用一個structured array

如:

>>> ym = np.zeros(len(neg_words), dtype=[('heads','a14'),('vals','f4',(n_rows,))]) 

array([('', [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]), 
     ('', [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]), 
     ('', [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]), 
     ('', [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]), 
     ('', [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]), 
     ('', [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]), 
     ('', [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]), 
     ('', [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]), 
     ('', [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]), 
     ('', [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]), 
     ('', [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]), 
     ('', [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]), 
     ('', [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]), 
     ('', [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]), 
     ('', [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]), 
     ('', [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]), 
     ('', [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]), 
     ('', [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]), 
     ('', [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0])], 
     dtype=[('heads', 'S14'), ('vals', '<f4', (12,))]) 

要設置標頭值:

ym['heads'] = neg_words 

要訪問標題:

>>> ym['heads'] 
array(['Age', 'MLU', 'All Tokens', 'no', 'not', "don't", "can't", 
    "won't", "isn't", "wasn't", "wouldn't", "shouldn't", "couldn't", 
    "didn't", "haven't", "aren't", "haven't", "hasn't", "doesn't"], 
    dtype='|S14') 

同樣,訪問值

ym['vals']