2016-05-18 19 views
0

我是新的SAS/SQL用戶,我有一個數據集,需要將某些行轉置爲列。我認爲有一個更快或更簡單的方法來做到這一點,我想給大家一些建議。我的例子會更好地解釋我的問題:將行轉換爲SAS或SQL中的列

這裏是集我:

Month ID  Car  Claim_Type Cost_of_claim 
    1 1243 Ferrari Collision  12,000 
    2 6437 Peugeot Fire   50,000 
    5 0184 Citroen Stole   3,000 
    9 1930 Fiat  Medical   1,000 
    3 2934 GM   Liability  20,000 

,我需要創建一個這樣的數據集:

Month ID  Car Collision Fire Stole Medical Liability 
1 1243 Ferrari 12,000  0  0  0   0 
2 6437 Peugeot  0  50,000  0  0   0   
5 0184 Citroen  0   0  3,000 0   0 
9 1930 Fiat   0   0  0  1,000  0 
3 2934 GM   0   0  0  0  20,000 

我只是調換了一些列的列...

我在想做類似的事情來創建我的新數據集:

proc sql; 
select Month, ID, CAR 
    case when Claim_Type = 'Collision' then Cost_of_claim end Collision, 
    case when Claim_Type = 'Fire'  then Cost_of_claim end Fire, 
    case when Claim_Type = 'Stole'  then Cost_of_claim end Stole, 
    case when Claim_Type = 'Medical' then Cost_of_claim end Medical, 
    case when Claim_Type = 'Liability' then Cost_of_claim end Liability 
from my_table; 

問題是,有大量的數據,我認爲這種方式可能不是太高效。另外,在我的數據集中,我有更多的列和行,並且不希望在case when語句中輸入所有可能性,因爲它似乎不易於維護代碼(或用戶友好)。

有人可以幫助我解決這個問題嗎?

回答

0

您可以嘗試動態sql和數據透視表,但性能取決於您擁有多少種不同的聲明類型。

create table #mytable (Month int, ID int, Car varchar(20), Claim_Type varchar(20), Cost_of_claim int) 

insert into #mytable values 
(1, 1243, 'Ferrari', 'Collision', 12000) 
, (2, 6437, 'Peugeot', 'Fire', 50000) 
, (5, 184, 'Citroen', 'Stole', 3000) 
, (9, 1930, 'Fiat', 'Medical', 1000) 
, (3, 2934, 'GM', 'Liability', 20000) 
, (12, 4455, 'Ford', 'Theft', 20) 


DECLARE @cols AS NVARCHAR(MAX), 
    @query AS NVARCHAR(MAX) 

select @cols = STUFF((SELECT ',' + QUOTENAME(Claim_Type) 
        from #mytable 
        group by Claim_Type 
        order by Claim_Type 
      FOR XML PATH(''), TYPE 
      ).value('.', 'NVARCHAR(MAX)') 
     ,1,1,'') 

set @query = N'SELECT ' + 'month,id,car,' + @cols + N' from 
      (
       select month,id, car, Cost_of_claim, Claim_Type 
       from #mytable    
      ) x 
      pivot 
      (
       max(Cost_of_claim) 
       for Claim_Type in (' + @cols + N') 
      ) p 
      ' 

exec sp_executesql @query; 

drop table #mytable 
0

這種方法填充與所有可能的claim_types宏變量,並通過他們循環,產生在你的示例代碼執行同樣的方式變量,這樣你就不會需要輸入所有可能的情況。使用「backstop」變量是因爲循環中有逗號(SAS會在proc sql步驟中的最後一個逗號之後沒有多個變量時出錯)。

data have; 
    input Month ID Car $12. Claim_Type $12. Cost_of_claim; 
    datalines; 
    1 1243 Ferrari Collision  12000 
    2 6437 Peugeot Fire   50000 
    5 0184 Citroen Stole   3000 
    9 1930 Fiat  Medical   1000 
    3 2934 GM   Liability  20000 
    ; 
run; 


%macro your_macro; 

    proc sql noprint; 
     select distinct claim_type into: list_of_claims separated by " " from have; 

     create table want (drop = backstop) as select 
      month, id, car, 
       %do i = 1 %to %sysfunc(countw(&list_of_claims.)); 
       %let this_claim = %scan(&list_of_claims., &i.); 
        case when claim_type = "&this_claim." then cost_of_claim else 0 end as &this_claim., 
       %end; 
      1 as backstop 
     from have; 
    quit; 

%mend your_macro; 

%your_macro; 
3

PROC TRANSPOSE應該做你想做的。

data test; 
    input Month ID  Car $  Claim_Type : $12. Cost_of_claim; 
    cards; 
    1 1243 Ferrari Collision  12000 
    2 6437 Peugeot Fire   50000 
    5 0184 Citroen Stole   3000 
    9 1930 Fiat  Medical   1000 
    3 2934 GM   Liability  20000 
run; 

proc transpose data=test out=transposed; 
    by notsorted month notsorted id notsorted car; 
    var cost_of_claim; 
    id claim_type; 
run; 

輸出數據集沒有非對角線零點,但你可以添加那些在數據的步驟,如果你真的想要他們。

+0

NOTSORTED適用於所有BY語句,它不像DESCENDING選項。我通常把它放在最後。 –