如何獲得`skbio` PCoA（主座標分析）結果？

我正在查看skbio'sPCoA方法（下面列出）的attributes。我是API的新手，我希望能夠獲得eigenvectors和投影到新軸上的原始點，類似於的sklearn.decomposition.PCA，因此我可以創建一些PC_1 vs PC_2樣式的圖。我想出瞭如何獲得eigvals和proportion_explained，但features返回爲None。如何獲得`skbio` PCoA（主座標分析）結果？

這是因爲它在測試？

如果有任何教程使用它，那將不勝感激。我是scikit-learn的巨大粉絲，並且希望開始使用更多的scikit's產品。

| Attributes 
| ---------- 
| short_method_name : str 
|  Abbreviated ordination method name. 
| long_method_name : str 
|  Ordination method name. 
| eigvals : pd.Series 
|  The resulting eigenvalues. The index corresponds to the ordination 
|  axis labels 
| samples : pd.DataFrame 
|  The position of the samples in the ordination space, row-indexed by the 
|  sample id. 
| features : pd.DataFrame 
|  The position of the features in the ordination space, row-indexed by 
|  the feature id. 
| biplot_scores : pd.DataFrame 
|  Correlation coefficients of the samples with respect to the features. 
| sample_constraints : pd.DataFrame 
|  Site constraints (linear combinations of constraining variables): 
|  coordinates of the sites in the space of the explanatory variables X. 
|  These are the fitted site scores 
| proportion_explained : pd.Series 
|  Proportion explained by each of the dimensions in the ordination space. 
|  The index corresponds to the ordination axis labels

這裏是我的代碼來生成principal component analysis對象。

import numpy as np 
import pandas as pd 
import matplotlib.pyplot as plt 
from sklearn.datasets import load_iris 
from sklearn.preprocessing import StandardScaler 
from sklearn import decomposition 
import seaborn as sns; sns.set_style("whitegrid", {'axes.grid' : False}) 
import skbio 
from scipy.spatial import distance 

%matplotlib inline 
np.random.seed(0) 

# Iris dataset 
DF_data = pd.DataFrame(load_iris().data, 
         index = ["iris_%d" % i for i in range(load_iris().data.shape[0])], 
         columns = load_iris().feature_names) 
n,m = DF_data.shape 
# print(n,m) 
# 150 4 

Se_targets = pd.Series(load_iris().target, 
         index = ["iris_%d" % i for i in range(load_iris().data.shape[0])], 
         name = "Species") 

# Scaling mean = 0, var = 1 
DF_standard = pd.DataFrame(StandardScaler().fit_transform(DF_data), 
          index = DF_data.index, 
          columns = DF_data.columns) 

# Distance Matrix 
Ar_dist = distance.squareform(distance.pdist(DF_standard.T, metric="braycurtis")) # (m x m) distance measure 
DM_dist = skbio.stats.distance.DistanceMatrix(Ar_dist, ids=DF_standard.columns) 
PCoA = skbio.stats.ordination.pcoa(DM_dist)

來源

2016-07-14 O.rka

，您可以訪問OrdinationResults.samples轉化樣品座標。這將返回一個pandas.DataFrame行 - 以樣本ID（即距離矩陣中的ID）爲索引。由於主座標分析對樣本的距離矩陣進行操作，因此變換後的要素座標（OrdinationResults.features）不可用。 scikit-bio中的其他排序方法接受樣本x特徵表作爲輸入將具有變換的特徵座標可用（例如，CA，CCA，RDA）。

附註：distance.squareform調用是不必要的，因爲skbio.DistanceMatrix支持平方或向量形式的數組。

來源

2016-07-14 21:51:10 jairideout

我相信'.samples'什麼都沒有返回。我可以再試一次，我會確保我已更新了我的'skbio'。我一直在閱讀關於PCoA的資料，而且很多資源都很隱晦。就PCA而言，它是相同的步驟，而是距離矩陣而不是協方差矩陣的特徵分解？ –

'.samples'是'pcoa'產生的'OrdinationResults'所必需的。如果你還沒有找到'None'，你可以在[scikit-bio issue tracker]（https://github.com/biocore/scikit-bio/issues）上發佈一個問題嗎？我的理解是，PCoA應用於距離矩陣，允許使用非歐幾里得距離度量，而PCA應用於特徵表並使用歐幾里德距離。因此，在歐幾里德距離矩陣上運行PCoA就相當於PCA。 [Here's]（http://ordination.okstate.edu/overview.htm#Principal_coordinates_analysis）爲排序方法提供了有用的資源。 – jairideout

'DF = skbio.OrdinationResults（long_method_name =「TESTING」，short_method_name =「test」，eigvals = PCoA.eigvals，samples = DF_data） DF.samples'給我回到我未轉換的原始數據。我做錯了嗎？ –

如何獲得`skbio` PCoA（主座標分析）結果？

回答

相關問題