2015-10-26 43 views
0

我有一個字符串變量,看起來像接收到的數據串的順序:如何更改基於日期

var_name 
25-DEC-99: A11, B14, C89; 28-FEB-94: A27, B94, C30 
01-APR-11: A25, B82, C65 
04-JUL-09: A21, B55, C26; 12-MAR-03: A11, B72, C68; 08-JUN-11: A62, B47, C82 
12-JUN-00: A77, B19, C73; 03-JUL-12: A99, B04, C54 
27-OCT-15: A22, B95, C08 

等。我的目標是將這些字符串分成不同的變量名稱。變量名是v1_datev1_Av1_Bv1_Cv2_datev2_Av2_Bv2_Cv3_datev3_Av3_Bv3_C

我可以用split var_name, p(";"),重命名爲v1v2,並且v3,然後再split做到這一點。但問題是我想要v1v2v3基於日期的時間順序,並且數據當前沒有按照這種方式排列。如何使v1的日期在v2之前,並且v2的日期在v3之前?例如,在第一次觀察中,我希望25-DEC-99: A11, B14, C89v228-FEB-94: A27, B94, C30關聯,並與v1關聯。

回答

1

以下讓你接近,我相信。它使用splitreshape

clear 
set more off 

input /// 
str100 myvar 
"25-DEC-99: A11, B14, C89; 28-FEB-94: A27, B94, C30" 
"01-APR-11: A25, B82, C65" 
"04-JUL-09: A21, B55, C26; 12-MAR-03: A11, B72, C68; 08-JUN-11: A62, B47, C82" 
"12-JUN-00: A77, B19, C73; 03-JUL-12: A99, B04, C54" 
"27-OCT-15: A22, B95, C08" 
end 

split myvar, p(;) 
drop myvar 

gen obs = _n 
reshape long myvar, i(obs) 
drop if missing(myvar) 

split myvar, p(:) 
drop myvar 

gen myvar11 = date(myvar1, "DMY", 2020) 
format %td myvar11 

drop myvar1 
rename (myvar11 myvar2) (mydate mycells) 
order mydate, before(mycells) 

bysort obs (mydate) : gen neworder = _n 
drop _j 

reshape wide mydate mycells, i(obs) j(neworder) 

list 

您可以循環在mycells變量,如果您需要進一步split他們。

+0

這就是OP所要求的,但我的預測是數據結構將證明很尷尬。 –

+0

@NickCox我同意。原始的海報可以保留一個面板結構放棄最後一次'重塑'。 –

1

一般來說,請考慮使用dataex(SSC)來創建簡單的數據示例。

您不給所有(不是平凡的)代碼,您用於split變量。碰巧,我不認爲你的變量名稱很容易處理,所以我以我自己的方式重新創建了分割。如果你的分割數據,然後按日期排序很容易,但我已經拉上了短缺reshape wide,因爲我懷疑長期結構更容易處理。

clear 
input str80 data 
"25-DEC-99: A11, B14, C89; 28-FEB-94: A27, B94, C30" 
"01-APR-11: A25, B82, C65" 
"04-JUL-09: A21, B55, C26; 12-MAR-03: A11, B72, C68; 08-JUN-11: A62, B47, C82" 
"12-JUN-00: A77, B19, C73; 03-JUL-12: A99, B04, C54" 
"27-OCT-15: A22, B95, C08" 
end 

split data, p(;) gen(x) 

local j = 1 
gen work = "" 
foreach x of var x* { 
    replace work = substr(`x', 1, strpos(`x', ":") - 1) 
    gen date`j' = daily(work, "DMY", 2050) 
    replace work = substr(`x', strpos(`x', ":") + 1, .) 
    split work, p(,) 
    rename (work1 work2 work3) (vA`j' vB`j' vC`j') 
    local ++j 
} 

drop work 
drop x* 
drop data 

gen id = _n 
edit 
reshape long date vA vB vC, i(id) j(which) 
drop if missing(date) 
bysort id (date): replace which = _n 
list, sepby(id) 

    +----------------------------------------+ 
    | id which date vA  vB  vC | 
    |----------------------------------------| 
    1. | 1  1 12477 A27 B94 C30 | 
    2. | 1  2 14603 A11 B14 C89 | 
    |----------------------------------------| 
    3. | 2  1 18718 A25 B82 C65 | 
    |----------------------------------------| 
    4. | 3  1 15776 A11 B72 C68 | 
    5. | 3  2 18082 A21 B55 C26 | 
    6. | 3  3 18786 A62 B47 C82 | 
    |----------------------------------------| 
    7. | 4  1 14773 A77 B19 C73 | 
    8. | 4  2 19177 A99 B04 C54 | 
    |----------------------------------------| 
    9. | 5  1 20388 A22 B95 C08 | 
    +----------------------------------------+