如何使用PCA模型預測Stata中新數據的分數？

我的問題類似於R: using predict() on new data with high dimensionality，但對於Stata如何使用PCA模型預測Stata中新數據的分數？

我想在一個數據子集（實驗的控制組）上運行主要組件模型（pca）以提取第一個組件。然後，我想在單獨的數據子集（實驗的治療組）上重新運行PCA模型，並獲得這些數據的分數。本質上，我想使用在數據集1上運行的pca模型來預測新數據集2中的分數。

在R中，只會將模型擬合到控制組，然後在擬合模型上使用「預測」命令，並在「新數據」參數中使用完整數據集。這將僅對來自對照組的模型的所有觀察結果產生預測。但是，Stata如何做到這一點？根據尼克的回答是

global xlist2a std_agreedisagree1_1_a std_revagreedisagree1_2_a std_revagreedisagree1_3_a std_agreedisagree1_4_a std_revagreedisagree1_10_a std_revagreedisagree1_5_a 
pca $xlist2a 
screeplot, yline(1)  
rotate, clear  
pca $xlist2a, com(3) 
rotate, varimax blanks (.30) 
predict pca5_p1b pca5_p2b pca5_p3b, score

固定碼：

global xlist2a std_agreedisagree1_1_a std_revagreedisagree1_2_a std_revagreedisagree1_3_a std_agreedisagree1_4_a std_revagreedisagree1_10_a std_revagreedisagree1_5_a 
pca $xlist2a if zgroupa10==1 
screeplot, yline(1)  
rotate, clear  
pca $xlist2a if zgroupa10==1, com(3) 
rotate, varimax blanks (.30) 
predict pca5_p1b pca5_p2b pca5_p3b, score

來源

2016-11-22 emily004

這裏的好問題顯示**一些**代碼嘗試。 –

謝謝，我編輯了這篇文章以包含代碼。 – emily004

感謝您添加代碼，但上面的代碼在** all **上對某些變量的觀察結果運行'pca'，然後對所有觀察結果進行'預測'。這不是你應該做的，但你的評論低於我的答案意味着你的真實代碼應用了所需的方法。 –

你嘗試什麼碼？實驗最簡單的表明，同樣的方法在Stata也工作：

. sysuse auto, clear 
(1978 Automobile Data) 

. pca headroom trunk length displacement if foreign 

Principal components/correlation     Number of obs =   22 
               Number of comp. =   4 
               Trace   =   4 
    Rotation: (unrotated = principal)   Rho    =  1.0000 

    -------------------------------------------------------------------------- 
     Component | Eigenvalue Difference   Proportion Cumulative 
    -------------+------------------------------------------------------------ 
      Comp1 |  1.93666  .656823    0.4842  0.4842 
      Comp2 |  1.27983  .615381    0.3200  0.8041 
      Comp3 |  .664453  .545396    0.1661  0.9702 
      Comp4 |  .119057   .    0.0298  1.0000 
    -------------------------------------------------------------------------- 

Principal components (eigenvectors) 

    -------------------------------------------------------------------- 
     Variable | Comp1  Comp2  Comp3  Comp4 | Unexplained 
    -------------+----------------------------------------+------------- 
     headroom | 0.0288 0.7373 0.6749 0.0083 |   0 
      trunk | 0.2443 0.6496 -0.7199 -0.0090 |   0 
      length | 0.6849 -0.1313 0.1229 -0.7061 |   0 
    displacement | 0.6858 -0.1313 0.1054 0.7080 |   0 
    -------------------------------------------------------------------- 

. predict score1 score2 if !foreign 
(score assumed) 
(2 components skipped) 

Scoring coefficients 
    sum of squares(column-loading) = 1 

    ------------------------------------------------------ 
     Variable | Comp1  Comp2  Comp3  Comp4 
    -------------+---------------------------------------- 
     headroom | 0.0288 0.7373 0.6749 0.0083 
      trunk | 0.2443 0.6496 -0.7199 -0.0090 
      length | 0.6849 -0.1313 0.1229 -0.7061 
    displacement | 0.6858 -0.1313 0.1054 0.7080 
    ------------------------------------------------------

。

來源

2016-11-22 17:43:01

在我沒有插入zgroup10 == 1之前，你幫我回答了我的問題，並且插入了zgroup10 == 1，它工作正常。謝謝你的耐心。 – emily004

如何使用PCA模型預測Stata中新數據的分數？

回答

相關問題