2017-06-21 9 views
1

考慮到Python與R在數據處理方面的潛在等價性,我正在研究基礎知識。特別是,在裝載數據庫時,如虹膜在R,簡單的命令head()產生一個美麗的打印輸出在屏幕上:Scikit中數據集的初始可視化

head(iris) 
    Sepal.Length Sepal.Width Petal.Length Petal.Width Species 
1   5.1   3.5   1.4   0.2 setosa 
2   4.9   3.0   1.4   0.2 setosa 
3   4.7   3.2   1.3   0.2 setosa 
4   4.6   3.1   1.5   0.2 setosa 
5   5.0   3.6   1.4   0.2 setosa 
6   5.4   3.9   1.7   0.4 setosa 

Scikit包括數據集,但如果我是不熟悉它,並試圖只是看看它看起來像什麼,這些將是第一個不幸的結果:

from sklearn.datasets import load_iris 
iris = load_iris() 
iris 
{'DESCR': 'Iris Plants Database\n\nNotes\n-----\nData Set Characteristics:\n :Number of Instances: 150 (50 in each of three classes)\n :Number of Attributes: 4 numeric, predictive attributes and the class\n :Attribute Information:\n  - sepal length in cm\n  - sepal width in cm\n  - petal length in cm\n  - petal width in cm\n  - class:\n    - Iris-Setosa\n    - Iris-Versicolour\n    - Iris-Virginica\n :Summary Statistics:\n\n ============== ==== ==== ======= ===== ====================\n     Min Max Mean SD Class Correlation\n ============== ==== ==== ======= ===== ====================\n sepal length: 4.3 7.9 5.84 0.83 0.7826\n sepal width: 2.0 4.4 3.05 0.43 -0.4194\n petal length: 1.0 6.9 3.76 1.76 0.9490 (high!)\n petal width: 0.1 2.5 1.20 0.76  0.9565 (high!)\n ============== ==== ==== ======= ===== ====================\n\n :Missing Attribute Values: None\n :Class Distribution: 33.3% for each of 3 classes.\n :Creator: R.A. Fisher\n :Donor: Michael Marshall (MARSHALL%[email protected])\n :Date: July, 1988\n\nThis is a copy of UCI ML iris datasets.\nhttp://archive.ics.uci.edu/ml/datasets/Iris\n\nThe famous Iris database, first used by Sir R.A Fisher\n\nThis is perhaps the best known database to be found in the\npattern recognition literature. Fisher\'s paper is a classic in the field and\nis referenced frequently to this day. (See Duda & Hart, for example.) The\ndata set contains 3 classes of 50 instances each, where each class refers to a\ntype of iris plant. One class is linearly separable from the other 2; the\nlatter are NOT linearly separable from each other.\n\nReferences\n----------\n - Fisher,R.A. "The use of multiple measurements in taxonomic problems"\n  Annual Eugenics, 7, Part II, 179-188 (1936); also in "Contributions to\n  Mathematical Statistics" (John Wiley, NY, 1950).\n - Duda,R.O., & Hart,P.E. (1973) Pattern Classification and Scene Analysis.\n  (Q327.D83) John Wiley & Sons. ISBN 0-471-22361-1. See page 218.\n - Dasarathy, B.V. (1980) "Nosing Around the Neighborhood: A New System\n  Structure and Classification Rule for Recognition in Partially Exposed\n  Environments". IEEE Transactions on Pattern Analysis and Machine\n  Intelligence, Vol. PAMI-2, No. 1, 67-71.\n - Gates, G.W. (1972) "The Reduced Nearest Neighbor Rule". IEEE Transactions\n  on Information Theory, May 1972, 431-433.\n - See also: 1988 MLC Proceedings, 54-64. Cheeseman et al"s AUTOCLASS II\n  conceptual clustering system finds 3 classes in the data.\n - Many, many more ...\n', 
'data': array([[ 5.1, 3.5, 1.4, 0.2], 
     [ 4.9, 3. , 1.4, 0.2], 
     [ 4.7, 3.2, 1.3, 0.2], 
     [ 4.6, 3.1, 1.5, 0.2], 
     [ 5. , 3.6, 1.4, 0.2], 
     [ 5.4, 3.9, 1.7, 0.4], 
     [ 4.6, 3.4, 1.4, 0.3], 
     [ 5. , 3.4, 1.5, 0.2], 
     [ 4.4, 2.9, 1.4, 0.2], 
     [ 4.9, 3.1, 1.5, 0.1], 
     [ 5.4, 3.7, 1.5, 0.2], 
     [ 4.8, 3.4, 1.6, 0.2], 
     [ 4.8, 3. , 1.4, 0.1], 
     [ 4.3, 3. , 1.1, 0.1], 
     [ 5.8, 4. , 1.2, 0.2], 
     [ 5.7, 4.4, 1.5, 0.4], 
     [ 5.4, 3.9, 1.3, 0.4], 
     [ 5.1, 3.5, 1.4, 0.3], 
     [ 5.7, 3.8, 1.7, 0.3], 
     [ 5.1, 3.8, 1.5, 0.3], 
     [ 5.4, 3.4, 1.7, 0.2], 
     [ 5.1, 3.7, 1.5, 0.4], 
     [ 4.6, 3.6, 1. , 0.2], 
     [ 5.1, 3.3, 1.7, 0.5], 
     [ 4.8, 3.4, 1.9, 0.2], 
     [ 5. , 3. , 1.6, 0.2], 
     [ 5. , 3.4, 1.6, 0.4], 
     [ 5.2, 3.5, 1.5, 0.2], 
     [ 5.2, 3.4, 1.4, 0.2], 
     [ 4.7, 3.2, 1.6, 0.2], 
     [ 4.8, 3.1, 1.6, 0.2], 
     [ 5.4, 3.4, 1.5, 0.4], 
     [ 5.2, 4.1, 1.5, 0.1], 
     [ 5.5, 4.2, 1.4, 0.2], 
     [ 4.9, 3.1, 1.5, 0.1], 
     [ 5. , 3.2, 1.2, 0.2], 
     [ 5.5, 3.5, 1.3, 0.2], 
     [ 4.9, 3.1, 1.5, 0.1], 
     [ 4.4, 3. , 1.3, 0.2], 
     [ 5.1, 3.4, 1.5, 0.2], 
     [ 5. , 3.5, 1.3, 0.3], 
     [ 4.5, 2.3, 1.3, 0.3], 
     [ 4.4, 3.2, 1.3, 0.2], 
     [ 5. , 3.5, 1.6, 0.6], 
     [ 5.1, 3.8, 1.9, 0.4], 
     [ 4.8, 3. , 1.4, 0.3], 
     [ 5.1, 3.8, 1.6, 0.2], 
     [ 4.6, 3.2, 1.4, 0.2], 
     [ 5.3, 3.7, 1.5, 0.2], 
     [ 5. , 3.3, 1.4, 0.2], 
     [ 7. , 3.2, 4.7, 1.4], 
     [ 6.4, 3.2, 4.5, 1.5], 
     [ 6.9, 3.1, 4.9, 1.5], 
     [ 5.5, 2.3, 4. , 1.3], 
     [ 6.5, 2.8, 4.6, 1.5], 
     [ 5.7, 2.8, 4.5, 1.3], 
     [ 6.3, 3.3, 4.7, 1.6], 
     [ 4.9, 2.4, 3.3, 1. ], 
     [ 6.6, 2.9, 4.6, 1.3], 
     [ 5.2, 2.7, 3.9, 1.4], 
     [ 5. , 2. , 3.5, 1. ], 
     [ 5.9, 3. , 4.2, 1.5], 
     [ 6. , 2.2, 4. , 1. ], 
     [ 6.1, 2.9, 4.7, 1.4], 
     [ 5.6, 2.9, 3.6, 1.3], 
     [ 6.7, 3.1, 4.4, 1.4], 
     [ 5.6, 3. , 4.5, 1.5], 
     [ 5.8, 2.7, 4.1, 1. ], 
     [ 6.2, 2.2, 4.5, 1.5], 
     [ 5.6, 2.5, 3.9, 1.1], 
     [ 5.9, 3.2, 4.8, 1.8], 
     [ 6.1, 2.8, 4. , 1.3], 
     [ 6.3, 2.5, 4.9, 1.5], 
     [ 6.1, 2.8, 4.7, 1.2], 
     [ 6.4, 2.9, 4.3, 1.3], 
     [ 6.6, 3. , 4.4, 1.4], 
     [ 6.8, 2.8, 4.8, 1.4], 
     [ 6.7, 3. , 5. , 1.7], 
     [ 6. , 2.9, 4.5, 1.5], 
     [ 5.7, 2.6, 3.5, 1. ], 
     [ 5.5, 2.4, 3.8, 1.1], 
     [ 5.5, 2.4, 3.7, 1. ], 
     [ 5.8, 2.7, 3.9, 1.2], 
     [ 6. , 2.7, 5.1, 1.6], 
     [ 5.4, 3. , 4.5, 1.5], 
     [ 6. , 3.4, 4.5, 1.6], 
     [ 6.7, 3.1, 4.7, 1.5], 
     [ 6.3, 2.3, 4.4, 1.3], 
     [ 5.6, 3. , 4.1, 1.3], 
     [ 5.5, 2.5, 4. , 1.3], 
     [ 5.5, 2.6, 4.4, 1.2], 
     [ 6.1, 3. , 4.6, 1.4], 
     [ 5.8, 2.6, 4. , 1.2], 
     [ 5. , 2.3, 3.3, 1. ], 
     [ 5.6, 2.7, 4.2, 1.3], 
     [ 5.7, 3. , 4.2, 1.2], 
     [ 5.7, 2.9, 4.2, 1.3], 
     [ 6.2, 2.9, 4.3, 1.3], 
     [ 5.1, 2.5, 3. , 1.1], 
     [ 5.7, 2.8, 4.1, 1.3], 
     [ 6.3, 3.3, 6. , 2.5], 
     [ 5.8, 2.7, 5.1, 1.9], 
     [ 7.1, 3. , 5.9, 2.1], 
     [ 6.3, 2.9, 5.6, 1.8], 
     [ 6.5, 3. , 5.8, 2.2], 
     [ 7.6, 3. , 6.6, 2.1], 
     [ 4.9, 2.5, 4.5, 1.7], 
     [ 7.3, 2.9, 6.3, 1.8], 
     [ 6.7, 2.5, 5.8, 1.8], 
     [ 7.2, 3.6, 6.1, 2.5], 
     [ 6.5, 3.2, 5.1, 2. ], 
     [ 6.4, 2.7, 5.3, 1.9], 
     [ 6.8, 3. , 5.5, 2.1], 
     [ 5.7, 2.5, 5. , 2. ], 
     [ 5.8, 2.8, 5.1, 2.4], 
     [ 6.4, 3.2, 5.3, 2.3], 
     [ 6.5, 3. , 5.5, 1.8], 
     [ 7.7, 3.8, 6.7, 2.2], 
     [ 7.7, 2.6, 6.9, 2.3], 
     [ 6. , 2.2, 5. , 1.5], 
     [ 6.9, 3.2, 5.7, 2.3], 
     [ 5.6, 2.8, 4.9, 2. ], 
     [ 7.7, 2.8, 6.7, 2. ], 
     [ 6.3, 2.7, 4.9, 1.8], 
     [ 6.7, 3.3, 5.7, 2.1], 
     [ 7.2, 3.2, 6. , 1.8], 
     [ 6.2, 2.8, 4.8, 1.8], 
     [ 6.1, 3. , 4.9, 1.8], 
     [ 6.4, 2.8, 5.6, 2.1], 
     [ 7.2, 3. , 5.8, 1.6], 
     [ 7.4, 2.8, 6.1, 1.9], 
     [ 7.9, 3.8, 6.4, 2. ], 
     [ 6.4, 2.8, 5.6, 2.2], 
     [ 6.3, 2.8, 5.1, 1.5], 
     [ 6.1, 2.6, 5.6, 1.4], 
     [ 7.7, 3. , 6.1, 2.3], 
     [ 6.3, 3.4, 5.6, 2.4], 
     [ 6.4, 3.1, 5.5, 1.8], 
     [ 6. , 3. , 4.8, 1.8], 
     [ 6.9, 3.1, 5.4, 2.1], 
     [ 6.7, 3.1, 5.6, 2.4], 
     [ 6.9, 3.1, 5.1, 2.3], 
     [ 5.8, 2.7, 5.1, 1.9], 
     [ 6.8, 3.2, 5.9, 2.3], 
     [ 6.7, 3.3, 5.7, 2.5], 
     [ 6.7, 3. , 5.2, 2.3], 
     [ 6.3, 2.5, 5. , 1.9], 
     [ 6.5, 3. , 5.2, 2. ], 
     [ 6.2, 3.4, 5.4, 2.3], 
     [ 5.9, 3. , 5.1, 1.8]]), 
'feature_names': ['sepal length (cm)', 
    'sepal width (cm)', 
    'petal length (cm)', 
    'petal width (cm)'], 
'target': array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
     0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
     0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 
     1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 
     1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 
     2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 
     2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2]), 
'target_names': array(['setosa', 'versicolor', 'virginica'], 
     dtype='<U10')} 

堅韌!我可以嘗試的數據集轉換成使用熊貓一個數據幀...

from pandas import *  
iris = load_iris() 
df = DataFrame(iris.data, columns=iris.feature_names) 
df.head() 

enter image description here

其中,美學一邊(醜陋的框架和粗體 - 主觀的),更類似於與R的輸出,除了失蹤Species列。

我在做什麼錯?

回答

1

當使用Jupyter Notebook時,默認情況下,數據框將被表示爲一個html表格。使用print獲取文本。

df = pd.DataFrame(
    iris['data'], columns=iris['feature_names'] 
).assign(Species=iris['target_names'][iris['target']]) 

with pd.option_context('expand_frame_repr', False): 
    print(df.head()) 

    sepal length (cm) sepal width (cm) petal length (cm) petal width (cm) Species 
0    5.1    3.5    1.4    0.2 setosa 
1    4.9    3.0    1.4    0.2 setosa 
2    4.7    3.2    1.3    0.2 setosa 
3    4.6    3.1    1.5    0.2 setosa 
4    5.0    3.6    1.4    0.2 setosa 
+0

很多腿的工作......你是否還需要指定行的名稱,如果碰巧在那裏?看,如果你知道數據集並不是什麼大問題,但要快速查看... – Toni

+0

對不起,你失望了 – piRSquared

+0

我對你的答案大加讚賞,意識到可能沒有任何捷徑,並且會回到它接受它,如果它沒有得到簡化你的或其他答案。謝謝。 – Toni