2016-11-21 52 views
0

我有機器學習數據集「胸外科數據數據集」我想用matlab或python語言在tomek鏈接中運行它。使用Matlab或Python在Tomek鏈接中運行數據集

這裏是數據集鏈接: http://archive.ics.uci.edu/ml/datasets/Thoracic+Surgery+Data

是有可能做到這一點?請幫助我...

關於。

+0

這裏是使用託梅克鏈路的例子:[鏈接](http://stackoverflow.com/questions/12670253 /快速計算的tomek-link-in-r) – Amid

+0

請使用此鏈接http://contrib.scikit-learn.org/imbalanced-learn/auto_examples/under-sampling/plot_tomek_links.html – usct01

回答

-1

該鏈接提供代碼和情節細節以對數據集應用託梅克鏈接在Python http://contrib.scikit-learn.org/imbalanced-learn/auto_examples/under-sampling/plot_tomek_links.html

import numpy as np 
import matplotlib.pyplot as plt 

from sklearn.model_selection import train_test_split 
from sklearn.utils import shuffle 

from imblearn.under_sampling import TomekLinks 

print(__doc__) 

rng = np.random.RandomState(0) 
n_samples_1 = 500 
n_samples_2 = 50 
X_syn = np.r_[1.5 * rng.randn(n_samples_1, 2), 
       0.5 * rng.randn(n_samples_2, 2) + [2, 2]] 
y_syn = np.array([0] * (n_samples_1) + [1] * (n_samples_2)) 
X_syn, y_syn = shuffle(X_syn, y_syn) 
X_syn_train, X_syn_test, y_syn_train, y_syn_test = train_test_split(X_syn, 
                    y_syn) 

# remove Tomek links 
tl = TomekLinks(return_indices=True) 
X_resampled, y_resampled, idx_resampled = tl.fit_sample(X_syn, y_syn) 

fig = plt.figure() 
ax = fig.add_subplot(1, 1, 1) 

idx_samples_removed = np.setdiff1d(np.arange(X_syn.shape[0]), 
            idx_resampled) 
idx_class_0 = y_resampled == 0 
plt.scatter(X_resampled[idx_class_0, 0], X_resampled[idx_class_0, 1], 
      alpha=.8, label='Class #0') 
plt.scatter(X_resampled[~idx_class_0, 0], X_resampled[~idx_class_0, 1], 
      alpha=.8, label='Class #1') 
plt.scatter(X_syn[idx_samples_removed, 0], X_syn[idx_samples_removed, 1], 
      alpha=.8, label='Removed samples')