2017-08-28 50 views
2

說我有被創建的數據如下:如何排序矩陣標籤的結果

clear all 
set obs 150 
set seed 1234 
foreach i in 1 2 { 
    gen year`i' = round(runiform()*4) 
    tostring year`i', replace 
    replace year`i' = "Super Low" if year`i'=="0" 
    replace year`i' = "Kinda Low" if year`i'=="1" 
    replace year`i' = "Average to Mediocre" if year`i'=="2" 
    replace year`i' = "Pretty High" if year`i'=="3" 
    replace year`i' = "Incredibly High" if year`i'=="4" 
} 

我最終想創建LaTeX的表格呈現的頻率,百分比和百分比差異這兩個變量。重要的是,我想通過頻率在一年對它進行排序1.

這些方針的東西: enter image description here

發現很難做的比我的預期,我想出了下面的代碼(感謝https://www.statalist.org/forums/forum/general-stata-discussion/general/1124796-any-way-to-save-row-percentages-output-as-a-matrix) :

label define order 1 "Pretty High" 2 "Average to Mediocre" 3 "Kinda Low" 4 "Incredibly High" 5 "Super Low" 

foreach i in 1 2 { 
    encode year`i', gen(y`i'_freq) label(order) 
    tab y`i'_freq, matcell(y`i'_freq) 
    mata: st_matrix("y`i'_pct", (st_matrix("y`i'_freq") :/ colsum(st_matrix("y`i'_freq")))) 
} 

matrix combined = y1_freq, y1_pct 
foreach i in 2 { 
    matrix combined = combined, y`i'_freq, y`i'_pct 
} 

mata: st_matrix("c", (st_matrix("combined"), st_matrix("combined")[.,2] - st_matrix("combined")[.,4])) 

matrix rownames c = "Pretty High" "Average to Mediocre" "Kinda Low" "Incredibly High" "Super Low" 
matrix colnames c = "No. 1 Freq" "No. 1 Pct" "No. 2 Freq" "No. 2 Pct" "Difference" 
esttab matrix(c), nomtitles 

上面的問題是我硬編碼的變量排序。我如何概括這個以便自動完成?

任何其他提示,以改善我的代碼也感激。

+1

'matrix combined = y1_freq,y1_pct,y2_freq,y2_pct'會爲您節省三行。 –

回答

2

我建議使用涉及contract和兩年merge一個簡單的解決方案。您最初的代碼後運行此:

foreach i in 1 2 { 
    preserve 
    contract year`i', f(freq`i') p(pct`i') 
    tempfile year`i' 
    save `year`i'' 
    restore 
} 

use `year1', clear 
ren year1 year2 
merge m:m year2 using `year2', nogen 
ren year2 type 
gsort -freq1 
replace pct1 = pct1/100 
replace pct2 = pct2/100 
gen diff = pct1 - pct2 
list, clean 

這會給你一個結果:

     type freq1 pct1 freq2 pct2  diff 
    1.    Kinda Low  39 0.26  27 0.18   .08 
    2.   Pretty High  37 0.25  33 0.22 .0266667 
    3. Average to Mediocre  29 0.19  44 0.29   -.1 
    4.  Incredibly High  24 0.16  23 0.15 .0066667 
    5.    Super Low  21 0.14  23 0.15 -.0133333 

備註:

contract清除當前數據集並創建頻數和百分比的數據集year'i'。數據集被保存到臨時文件中以保持文件系統清潔,而不用擔心刪除文件。

然後,第一數據集被合併與所述第二。只保留第二個數據集的頻率和百分比。

降序排序是通過gsort -freq1命令來完成。要按升序排序,請運行gsort freq1

+0

謝謝 - 這比我的簡單得多! – bill999

+0

此時,將當前數據集導出到LaTeX的最簡單方法是什麼? – bill999

+1

很高興工作! 有一個應該完成這個工作的包'-dataout'。運行這些將數據集導出到LaTeX: 'ssc install dataout'和'dataout,save(myfile)tex replace' –

3

這是由@Andrey Ampilogov先前公佈答案的變化。

* sandbox code from OP 
clear all 
set obs 150 
set seed 1234 
foreach i in 1 2 { 
    gen year`i' = round(runiform()*4) 
} 

preserve 

stack year1 year2, into(year) clear 
contract year _stack, f(freq) p(percent) 
reshape wide freq percent, i(year) j(_stack) 

* define labels once when needed 
label define year 0 "Super Low"  /// 
    1 "Kinda Low" 2 "Average to Mediocre" /// 
    3 "Pretty High" 4 "Incredibly High" 
label val year year 

gsort -freq1 
list 

    +-----------------------------------------------------------+ 
    |    year freq1 percent1 freq2 percent2 | 
    |-----------------------------------------------------------| 
    1. |   Kinda Low  39  13.00  27  9.00 | 
    2. |   Pretty High  37  12.33  33  11.00 | 
    3. | Average to Mediocre  29  9.67  44  14.67 | 
    4. |  Incredibly High  24  8.00  23  7.67 | 
    5. |   Super Low  21  7.00  23  7.67 | 
    +-----------------------------------------------------------+ 

    restore 

我想強調的技術點是

  1. 轉換整數值串似乎一個更好的主意不是當你可以保留整數,並在方便時附加價值標籤。您必須查看您的原始定義才能恢復訂單信息。

  2. merge m:m由Stata的支持,但矯枉過正,即使它的工作原理。不需要精心設計文件編排。

  3. 對我來說,百分比由0和100這樣的問題包圍。但通過正確的數據結構,按比例縮放和計算差異很容易。

+0

謝謝你更多的澄清,尼克。這是瞭解'-stack-'命令的好方法 –