2016-04-26 26 views
1

我想爲我的所有變量(共90個)生成箱形圖。Boxplot爲SPSS中的所有變量

這是語法,我會用一個變量:

GGRAPH /GRAPHDATASET NAME="graphdataset" VARIABLES=AnScol MISSING=LISTWISEREPORTMISSING=NO /GRAPHSPEC SOURCE=INLINE. BEGIN GPL SOURCE: s=userSource(id("graphdataset")) DATA: AnScol =col(source(s), name("AnScol")) DATA: id=col(source(s), name("$CASENUM"), unit.category()) COORD: rect(dim(1), transpose()) GUIDE: axis(dim(1), label("AnScol")) ELEMENT: schema(position(bin.quantile.letter(AnScol)), label(id)) END GPL

如何在不改變每個變量的情況下爲所有變量做到這一點?

預先感謝您!

Maxime M.

+1

通常情況下,您會使用宏來做到這一點,但宏不適用於GPL。所以另一種方式是使用Python的可編程性 - 您可以嘗試以下源代碼:https://www.ibm.com/developerworks/community/blogs/ab16c38e-2f7b-4912-a47e-85682d124d32/entry/how_can_i_parameterize_ggraph_and_gpl_code6?lang=en。 –

回答

0

這裏我將說明幾種不同的方法。首先,我們製作一些假數據 - 十個數字變量和一個代表數據集的Id變量的字符串。

*Make fake data. 
MATRIX. 
SAVE {UNIFORM(100,10)} /VARS = V1 TO V10 /OUTFILE = *. 
END MATRIX. 
DATASET NAME Sim. 
COMPUTE MyId = $casenum. 
FORMATS MyId (F3.0). 
ALTER TYPE MyId (A3). 

一個簡單的解決方案是使用EXAMINE來繪製可變的箱形圖。這可以通過傳統的對話類型圖表獲得。

*If the variables all have the same scale. 
EXAMINE VARIABLES=V1 TO V10 
    /COMPARE VARIABLE 
    /PLOT=BOXPLOT 
    /STATISTICS=NONE 
    /NOTOTAL 
    /ID=MyId 
    /MISSING=LISTWISE. 

enter image description here

因爲所有的變量構建爲以相同的規模也能正常工作。它沒有異常值 - 但是如果它們在上面的圖中會被標記爲MyId變量。

你也可以使用GGRAPH來完成一些非常相似的事情。在這裏,我把異常值放在了外面,並且在GGRAPH的代碼中,你不能輕易做出異常值的變量。

*Make one variable not on the same scale and have outliers. 
COMPUTE V1 = V1*100. 
IF MyId = " 5" V1 = 250. 
EXECUTE. 
*Synonymous with GGRAPH - cant label outliers though. 
GGRAPH 
    /GRAPHDATASET NAME="graphdataset" VARIABLES=V1 TO V10 
    TRANSFORM=VARSTOCASES(SUMMARY="V" INDEX="Vars") 
    /GRAPHSPEC SOURCE=INLINE. 
BEGIN GPL 
    SOURCE: s=userSource(id("graphdataset")) 
    DATA: Vars=col(source(s), name("Vars"), unit.category()) 
    DATA: V=col(source(s), name("V")) 
    ELEMENT: schema(position(bin.quantile.letter(Vars*V))) 
END GPL. 

enter image description here

你可以在這裏看到,因爲V1是在不同的規模,現在,你不能有效地可視的一個情節其他變量。在現實的數據集中,這將會發生什麼。要做個別情節,你可以採取eli-k的建議,並使用Python爲每個變量提交不同的圖。這是一個例子。

*If they don't - and you want different scales, Python programmability can do that. 
BEGIN PROGRAM Python. 
import spss, string 

#get the variable list and the variable type 
varList = [(spss.GetVariableName(i),spss.GetVariableType(i)) for i in range(spss.GetVariableCount())] 

#make a template to submit boxplot, see https://andrewpwheeler.wordpress.com/2015/02/22/string-substitution-in-python-continued/ 
c = string.Template("""*Boxplots. 
GGRAPH 
    /GRAPHDATASET NAME="graphdataset" VARIABLES=$var MyId MISSING=LISTWISE REPORTMISSING=NO 
    /GRAPHSPEC SOURCE=INLINE. 
BEGIN GPL 
    SOURCE: s=userSource(id("graphdataset")) 
    DATA: V=col(source(s), name("$var")) 
    DATA: MyId=col(source(s), name("MyId"), unit.category()) 
    COORD: rect(dim(1), transpose()) 
    GUIDE: axis(dim(1), label("$var")) 
    ELEMENT: schema(position(bin.quantile.letter(V)), label(MyId)) 
END GPL. 
""") 

#loop over the varlist and plot if numeric 
for i in varList: 
    if i[1] == 0: 
    spss.Submit(c.substitute(var=i[0])) 
END PROGRAM. 

現在你可以看到每個變量都有自己的盒子圖來快速檢查(並且標有ID)。

enter image description here enter image description here

等,最後一種方式來完成類似的事情是把所有的數值變量重塑成一列,然後用SPLIT FILE

*Varstocases and split file - need to know which variables are numeric to begin with. 
VARSTOCASES /MAKE V FROM V1 TO V10 /INDEX VOrig. 
SORT CASES BY VOrig. 
SPLIT FILE BY VOrig. 
GGRAPH 
    /GRAPHDATASET NAME="graphdataset" VARIABLES=V MyId MISSING=LISTWISE REPORTMISSING=NO 
    /GRAPHSPEC SOURCE=INLINE. 
BEGIN GPL 
    SOURCE: s=userSource(id("graphdataset")) 
    DATA: V=col(source(s), name("V")) 
    DATA: MyId=col(source(s), name("MyId"), unit.category()) 
    COORD: rect(dim(1), transpose()) 
    GUIDE: axis(dim(1), label("V")) 
    ELEMENT: schema(position(bin.quantile.letter(V)), label(MyId)) 
END GPL. 
SPLIT FILE OFF.