2016-09-26 41 views
2

我想使用ifelse語句對數據表進行子集化,但是我沒有得到我正在查找的結果。在舊數據表上基於ifelse創建新數據表

我最初的數據表看起來像這樣:

head(Data_copy, n = 18) 

    Company  Date  DOW variable value Year Month End_of_Month 
1: ASXRI 1991-09-06 Friday  RI NA 1991 Sep   0 
2: ASXRI 1991-09-09 Monday  RI NA 1991 Sep   0 
3: ASXRI 1991-09-10 Tuesday  RI NA 1991 Sep   0 
4: ASXRI 1991-09-11 Wednesday  RI NA 1991 Sep   0 
5: ASXRI 1991-09-12 Thursday  RI NA 1991 Sep   0 
6: ASXRI 1991-09-13 Friday  RI NA 1991 Sep   0 
7: ASXRI 1991-09-16 Monday  RI NA 1991 Sep   0 
8: ASXRI 1991-09-17 Tuesday  RI NA 1991 Sep   0 
9: ASXRI 1991-09-18 Wednesday  RI NA 1991 Sep   0 
10: ASXRI 1991-09-19 Thursday  RI NA 1991 Sep   0 
11: ASXRI 1991-09-20 Friday  RI NA 1991 Sep   0 
12: ASXRI 1991-09-23 Monday  RI NA 1991 Sep   0 
13: ASXRI 1991-09-24 Tuesday  RI NA 1991 Sep   0 
14: ASXRI 1991-09-25 Wednesday  RI NA 1991 Sep   0 
15: ASXRI 1991-09-26 Thursday  RI NA 1991 Sep   0 
16: ASXRI 1991-09-27 Friday  RI NA 1991 Sep   0 
17: ASXRI 1991-09-30 Monday  RI NA 1991 Sep   1 
18: ASXRI 1991-10-01 Tuesday  RI NA 1991 Oct   0 

這是18行了25萬。

我想要的是如下分裂基礎上的ifelse功能此數據表:

Data1 <- ifelse("Weekly" == "Weekly", Data_copy[End_of_Month ==1,], Data_copy) 

*「週刊」 ==「週刊」位將是在一個函數以後使用。

我想讓Data1成爲一個只包含End_of_Month == 1的行的新數據表。

當我運行上面的代碼,我發現我得到名單的公司名稱,就是這樣。

我會告訴你的輸出是什麼樣子:

Data1[[1]] 
    [1] "ASXRI" "ASXRI" "ASXRI" "ASXRI" "ASXRI" "ASXRI" "ASXRI" "ASXRI" "ASXRI" "ASXRI" "ASXRI" 

現在,如果我滾動進一步下跌,我得到:

[1387] "AANRI" "AANRI" "AANRI" "AANRI" "AANRI" "AANRI" "APARI" "APARI" "APARI" "APARI" "APARI" 
[1398] "APARI" "APARI" "APARI" "APARI" "APARI" "APARI" "APARI" "APARI" "APARI" "APARI" "APARI" 

這些條目中的每一個僅僅是公司的名字之一。

我得到我想要的結果,如果我做的:

Data2 <- Data_copy[End_of_Month == 1, ] 

Company  Date  DOW variable value Year Month End_of_Month 
1: ASXRI 1991-09-30 Monday  RI NA 1991 Sep   1 
2: ASXRI 1991-10-31 Thursday  RI NA 1991 Oct   1 
3: ASXRI 1991-11-29 Friday  RI NA 1991 Nov   1 
4: ASXRI 1991-12-31 Tuesday  RI NA 1991 Dec   1 
5: ASXRI 1992-01-31 Friday  RI NA 1992 Jan   1 
6: ASXRI 1992-02-28 Friday  RI NA 1992 Feb   1 

從本質上講,我想重複數據2,但使用的ifelse語句。

這裏的第一個100行:

dput(head(Data_copy, n = 100)) 
structure(list(Company = c("ASXRI", "ASXRI", "ASXRI", "ASXRI", 
"ASXRI", "ASXRI", "ASXRI", "ASXRI", "ASXRI", "ASXRI", "ASXRI", 
"ASXRI", "ASXRI", "ASXRI", "ASXRI", "ASXRI", "ASXRI", "ASXRI", 
"ASXRI", "ASXRI", "ASXRI", "ASXRI", "ASXRI", "ASXRI", "ASXRI", 
"ASXRI", "ASXRI", "ASXRI", "ASXRI", "ASXRI", "ASXRI", "ASXRI", 
"ASXRI", "ASXRI", "ASXRI", "ASXRI", "ASXRI", "ASXRI", "ASXRI", 
"ASXRI", "ASXRI", "ASXRI", "ASXRI", "ASXRI", "ASXRI", "ASXRI", 
"ASXRI", "ASXRI", "ASXRI", "ASXRI", "ASXRI", "ASXRI", "ASXRI", 
"ASXRI", "ASXRI", "ASXRI", "ASXRI", "ASXRI", "ASXRI", "ASXRI", 
"ASXRI", "ASXRI", "ASXRI", "ASXRI", "ASXRI", "ASXRI", "ASXRI", 
"ASXRI", "ASXRI", "ASXRI", "ASXRI", "ASXRI", "ASXRI", "ASXRI", 
"ASXRI", "ASXRI", "ASXRI", "ASXRI", "ASXRI", "ASXRI", "ASXRI", 
"ASXRI", "ASXRI", "ASXRI", "ASXRI", "ASXRI", "ASXRI", "ASXRI", 
"ASXRI", "ASXRI", "ASXRI", "ASXRI", "ASXRI", "ASXRI", "ASXRI", 
"ASXRI", "ASXRI", "ASXRI", "ASXRI", "ASXRI"), Date = structure(c(7918, 
7921, 7922, 7923, 7924, 7925, 7928, 7929, 7930, 7931, 7932, 7935, 
7936, 7937, 7938, 7939, 7942, 7943, 7944, 7945, 7946, 7949, 7950, 
7951, 7952, 7953, 7956, 7957, 7958, 7959, 7960, 7963, 7964, 7965, 
7966, 7967, 7970, 7971, 7972, 7973, 7974, 7977, 7978, 7979, 7980, 
7981, 7984, 7985, 7986, 7987, 7988, 7991, 7992, 7993, 7994, 7995, 
7998, 7999, 8000, 8001, 8002, 8005, 8006, 8007, 8008, 8009, 8012, 
8013, 8014, 8015, 8016, 8019, 8020, 8021, 8022, 8023, 8026, 8027, 
8028, 8029, 8030, 8033, 8034, 8035, 8036, 8037, 8040, 8041, 8042, 
8043, 8044, 8047, 8048, 8049, 8050, 8051, 8054, 8055, 8056, 8057 
), class = "Date"), DOW = c("Friday", "Monday", "Tuesday", "Wednesday", 
"Thursday", "Friday", "Monday", "Tuesday", "Wednesday", "Thursday", 
"Friday", "Monday", "Tuesday", "Wednesday", "Thursday", "Friday", 
"Monday", "Tuesday", "Wednesday", "Thursday", "Friday", "Monday", 
"Tuesday", "Wednesday", "Thursday", "Friday", "Monday", "Tuesday", 
"Wednesday", "Thursday", "Friday", "Monday", "Tuesday", "Wednesday", 
"Thursday", "Friday", "Monday", "Tuesday", "Wednesday", "Thursday", 
"Friday", "Monday", "Tuesday", "Wednesday", "Thursday", "Friday", 
"Monday", "Tuesday", "Wednesday", "Thursday", "Friday", "Monday", 
"Tuesday", "Wednesday", "Thursday", "Friday", "Monday", "Tuesday", 
"Wednesday", "Thursday", "Friday", "Monday", "Tuesday", "Wednesday", 
"Thursday", "Friday", "Monday", "Tuesday", "Wednesday", "Thursday", 
"Friday", "Monday", "Tuesday", "Wednesday", "Thursday", "Friday", 
"Monday", "Tuesday", "Wednesday", "Thursday", "Friday", "Monday", 
"Tuesday", "Wednesday", "Thursday", "Friday", "Monday", "Tuesday", 
"Wednesday", "Thursday", "Friday", "Monday", "Tuesday", "Wednesday", 
"Thursday", "Friday", "Monday", "Tuesday", "Wednesday", "Thursday" 
), variable = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = c("RI", 
"VO", "MV", "TD", "ND"), class = "factor"), value = c(NA_real_, 
NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, 
NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, 
NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, 
NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, 
NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, 
NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, 
NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, 
NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, 
NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, 
NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, 
NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, 
NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, 
NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, 
NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, 
NA_real_), Year = c("1991", "1991", "1991", "1991", "1991", "1991", 
"1991", "1991", "1991", "1991", "1991", "1991", "1991", "1991", 
"1991", "1991", "1991", "1991", "1991", "1991", "1991", "1991", 
"1991", "1991", "1991", "1991", "1991", "1991", "1991", "1991", 
"1991", "1991", "1991", "1991", "1991", "1991", "1991", "1991", 
"1991", "1991", "1991", "1991", "1991", "1991", "1991", "1991", 
"1991", "1991", "1991", "1991", "1991", "1991", "1991", "1991", 
"1991", "1991", "1991", "1991", "1991", "1991", "1991", "1991", 
"1991", "1991", "1991", "1991", "1991", "1991", "1991", "1991", 
"1991", "1991", "1991", "1991", "1991", "1991", "1991", "1991", 
"1991", "1991", "1991", "1991", "1991", "1992", "1992", "1992", 
"1992", "1992", "1992", "1992", "1992", "1992", "1992", "1992", 
"1992", "1992", "1992", "1992", "1992", "1992"), Month = c("Sep", 
"Sep", "Sep", "Sep", "Sep", "Sep", "Sep", "Sep", "Sep", "Sep", 
"Sep", "Sep", "Sep", "Sep", "Sep", "Sep", "Sep", "Oct", "Oct", 
"Oct", "Oct", "Oct", "Oct", "Oct", "Oct", "Oct", "Oct", "Oct", 
"Oct", "Oct", "Oct", "Oct", "Oct", "Oct", "Oct", "Oct", "Oct", 
"Oct", "Oct", "Oct", "Nov", "Nov", "Nov", "Nov", "Nov", "Nov", 
"Nov", "Nov", "Nov", "Nov", "Nov", "Nov", "Nov", "Nov", "Nov", 
"Nov", "Nov", "Nov", "Nov", "Nov", "Nov", "Dec", "Dec", "Dec", 
"Dec", "Dec", "Dec", "Dec", "Dec", "Dec", "Dec", "Dec", "Dec", 
"Dec", "Dec", "Dec", "Dec", "Dec", "Dec", "Dec", "Dec", "Dec", 
"Dec", "Jan", "Jan", "Jan", "Jan", "Jan", "Jan", "Jan", "Jan", 
"Jan", "Jan", "Jan", "Jan", "Jan", "Jan", "Jan", "Jan", "Jan" 
), End_of_Month = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
0, 0)), .Names = c("Company", "Date", "DOW", "variable", "value", 
"Year", "Month", "End_of_Month"), class = c("data.table", "data.frame" 
), row.names = c(NA, -100L), .internal.selfref = <pointer: 0x00000000001f0788>) 
+0

我想複製您的初始data.table – jangorecki

+0

@jangorecki如果您想嘗試添加了一些數據。 –

+0

走出你的方式來使用'ifelse'通常是一個壞主意。儘管函數的語法很好,但它有很多缺點和侷限性,所以我會堅持使用'Data_copy [End_of_Month == 1]'的方法。也許我錯過了一些東西,因爲你沒有說你爲什麼要在這裏使用'ifelse'。 – Frank

回答

2

其他用戶已經注意到ifelse是不適合你的目的。解釋原因可能很有用。從?ifelseifelse(test, yes, no)返回相同長度的

向量和屬性(包括尺寸 和「‘類’」),爲「測試」和數據值從「是」 「不」

的值或

換言之,如果您的test矢量是長度爲1,ifelse(...)將返回長度1。例如矢量,

> ifelse(TRUE, 1:3, 7:9) 
[1] 1 
> ifelse(c(TRUE, FALSE), 1:3, 7:9) 
[1] 1 8 

在你的情況下,

ifelse("Weekly" == "Weekly", Data_copy[End_of_Month ==1,], Data_copy) 

將返回長度爲1的向量。更確切地說,由於測試返回TRUE,ifelse將返回您的yes參數中的第一個元素;因爲它是一個數據幀(一種列表),所以ifelse返回數據幀的第一個元素,它是第一列。這就是爲什麼你會得到公司名稱的列表。如果你真的想使用ifelse建設,努力

ifelse("Weekly" == "Weekly", list(Data_copy[End_of_Month ==1,]), list(Data_copy)) 

儘管如其他人所說的,你可能會更好使用if {} else {}