import scipy as sp
import matplotlib.pyplot as plt
data=sp.genfromtxt("data/train.tsv", delimiter ="\t", dtype="string", comments=None, skip_header=1)
x = data[:,0]
y = data[:,1]
x = x[~sp.isnan(y)]
y = x[~sp.isnan(y)]
DataOfInterest=x["avglinksize"]
EphemeralOrEvergreen=x["label"]
plt.scatter(DataOfInterest,EphemeralOrEvergreen)
plt.title("Training data")
plt.xlabel("Single feature from training set")
plt.ylabel("Ephemeral or Evergreen")
plt.grid()
plt.show()
輸出:
蟒蛇GenGraphs.py
Traceback (most recent call last):
File "GenGraphs.py", line 4, in <module>
data=sp.genfromtxt("data/train.tsv", delimiter ="\t", dtype="string", comments=None, skip_header=1)
File "/usr/lib/python2.7/dist-packages/numpy/lib/npyio.py", line 1746, in genfromtxt
output = np.array(data, dtype)
MemoryError
我想在對另一TSV文件,以圖一列。
我在這裏誤解了什麼?我還能如何做到這一點?
'train.tsv'有多大? – mdml
@ mtitan8它可以在這裏找到:http://www.kaggle.com/c/stumbleupon/data。它是20.6MB,有27列和7,396行。 –