3

林scikit: -問題上使用下面的代碼的多標籤數據分類多標籤數據

import numpy as np 
from sklearn.pipeline import Pipeline 
from sklearn.feature_extraction.text import CountVectorizer 
from sklearn.svm import LinearSVC 
from sklearn.feature_extraction.text import TfidfTransformer 
from sklearn.multiclass import OneVsRestClassifier 
from sklearn import preprocessing 

X_train = np.array(["new york is a hell of a town", 
        "new york was originally dutch", 
        "the big apple is great", 
        "new york is also called the big apple", 
        "nyc is nice", 
        "people abbreviate new york city as nyc", 
        "the capital of great britain is london", 
        "london is in the uk", 
        "london is in england", 
        "london is in great britain", 
        "it rains a lot in london", 
        "london hosts the british museum", 
        "new york is great and so is london", 
        "i like london better than new york"]) 
y_train_text = [[1],[1],[1],[1],[1],[1],[2],[2],[2],[2],[2],[2],[12],[12]] 

X_test = np.array(['nice day in nyc', 
        'welcome to london', 
        'london is rainy', 
        'it is raining in britian', 
        'it is raining in britian and the big apple', 
        'it is raining in britian and nyc', 
        'hello welcome to new york. enjoy it here and london too']) 
target_names = ['New York', 'London'] 

lb = preprocessing.MultiLabelBinarizer() 
Y = lb.fit_transform(y_train_text) 

classifier = Pipeline([ 
    ('vectorizer', CountVectorizer()), 
    ('tfidf', TfidfTransformer()), 
    ('clf', OneVsRestClassifier(LinearSVC()))]) 

classifier.fit(X_train, Y) 
predicted = classifier.predict(X_test) 

======輸出=====

[1, 0, 0],'New York' 
[0, 1, 0],'London' 
[0, 1, 0],'London' 
[0, 1, 0],'London' 
[1, 0, 0],'New York' 
[0, 0, 0], 
[0, 0, 0]] 

最後兩個是錯誤的預測,他們應該[0,0,1]爲['紐約','倫敦']

所以我有這些問題: - 1.]究竟是什麼問題代碼 2.]這是一個prope處理「多標籤」數據的方式?或者還有其他更好的方法。因爲這和一兩個代碼都是我可以在互聯網上找到關於「多標籤」數據的。而二進制分類有成千上萬。 請幫我在這

回答

1

12不是 「1」 和 「2」,它是12,因此

[[1],[1],[1],[1],[1],[1],[2],[2],[2],[2],[2],[2],[12],[12]] 

應該

[[1],[1],[1],[1],[1],[1],[2],[2],[2],[2],[2],[2],[1, 2],[1, 2]]