2017-07-18 153 views
0

我正試圖將textfile加載到numpy數組中。將文本文件讀入numpy數組

的結構如下:

THE 77534223 
AND 30997177 
ING 30679488 
ENT 17902107 
ION 17769261 
HER 15277018 
FOR 14686159 
THA 14222073 
NTH 14115952 
[...] 

但我不使用

import numpy as np 

data = np.genfromtxt("english_trigrams.txt", dtype=(str,int), delimiter=' ')             
print(data) 

[['TH' '77'] 
['AN' '30'] 
['IN' '30'] 
..., 
['JX' '1'] 
['JQ' '1'] 
['JQ' '1']] 

我想從(x,2)陣列,以在第一列D型str和DTYPE int在第二。

非常感謝!


P.S:

  • 的Python 3.6.1
  • NumPy的1.13.0
+0

類似也許嘗試的[如何使用numpy的np.loadtxt –

+2

可能的複製。 genfromtxt當第一列是字符串和其餘列是數字?](https://stackoverflow.com/questions/12319969/how-to-use-numpy-genfromtxt-when-first-column-is-string-and-the -rema ining-column) –

+1

'np.loadtxt(「english_trigrams.txt」,dtype = [('f0','| S3'),('f1','

回答

0

裝載的各種方法本文

In [470]: txt=b"""THE 77534223 
    ...: AND 30997177 
    ...: ING 30679488 
    ...: ENT 17902107 
    ...: ION 17769261 
    ...: HER 15277018 
    ...: FOR 14686159 
    ...: THA 14222073 
    ...: NTH 14115952""" 

genfromtxt推斷正確的列D型

In [471]: data = np.genfromtxt(txt.splitlines(),dtype=None) 
In [472]: data 
Out[472]: 
array([(b'THE', 77534223), (b'AND', 30997177), (b'ING', 30679488), 
     (b'ENT', 17902107), (b'ION', 17769261), (b'HER', 15277018), 
     (b'FOR', 14686159), (b'THA', 14222073), (b'NTH', 14115952)], 
     dtype=[('f0', 'S3'), ('f1', '<i4')]) 

不正確的D型規範;像你的,但每個元素只有1個字符。

In [473]: data = np.genfromtxt(txt.splitlines(),dtype=(str, int)) 
In [474]: data 
Out[474]: 
array([['T', '7'], 
     ['A', '3'], 
     ['I', '3'], 
     ['E', '1'], 
     ['I', '1'], 
     ['H', '1'], 
     ['F', '1'], 
     ['T', '1'], 
     ['N', '1']], 
     dtype='<U1') 

好一點 - 但字符串是太短

In [475]: data = np.genfromtxt(txt.splitlines(),dtype='str,int') 
In [476]: data 
Out[476]: 
array([('', 77534223), ('', 30997177), ('', 30679488), ('', 17902107), 
     ('', 17769261), ('', 15277018), ('', 14686159), ('', 14222073), 
     ('', 14115952)], 
     dtype=[('f0', '<U'), ('f1', '<i4')]) 

dtype=None情況

In [477]: data = np.genfromtxt(txt.splitlines(),dtype='U10,int') 
In [478]: data 
Out[478]: 
array([('THE', 77534223), ('AND', 30997177), ('ING', 30679488), 
     ('ENT', 17902107), ('ION', 17769261), ('HER', 15277018), 
     ('FOR', 14686159), ('THA', 14222073), ('NTH', 14115952)], 
     dtype=[('f0', '<U10'), ('f1', '<i4')])