使用字符串字符位置標識創建一個新變量

所以我已經能夠實現我想要的輸出，但我確信可以使用string來實現更高效的代碼。使用字符串字符位置標識創建一個新變量

讓這個數據

set.seed(123) 

A <- 1:100 
type.a <- rnorm(100, mean=5000, sd=1433) 
type.b <- rnorm(100, mean=5000, sd=1425) 
type.c <- rnorm(100, mean=5000, sd=1125) 
type.d <- rnorm(100, mean=5000, sd=1233) 

df1 <- data.frame(A, type.a, type.b, type.c, type.d)

起到現在，我們要爲df1創建一個新的變量，該變量的身份，如果以數字1開始了type(a:d)所以我已經使用這個代碼：

df1$Type_1 <- with(df1, ifelse((type.a < 2000 & type.a > 999)|(type.b < 2000 & type.c > 999)| 
           (type.c < 2000 & type.c > 999)|(type.d < 2000 & type.d > 999), 1,0))

或類似地，這也是：

df1$type_1 <- with(df1, ifelse(type.a < 2000 & type.a > 999, 1, 
           ifelse(type.b < 2000 & type.c > 999, 1, 
            ifelse(type.c < 2000 & type.c > 999, 1, 
              ifelse(type.d < 2000 & type.d > 999, 1,0)))))

現在我的問題形式兩個部分

你怎麼能使用string這將着眼於只有第一位的type(a:d)，以測試它是否等於我們的約束。 （在本例中等於1）

其次，我有四列以上的數據。因此，我不認爲這是有效的，我每次指定列名稱。可以使用[,x:y]嗎？

然後，代碼被用於創建數據的9個新列（即TYPE_1 & TYPE_2 ... type_9。）作爲第一位數我們type(a:d)的的範圍是1：9

來源

2015-08-21 lukeg

如何只'$ DF1 TYPE_1 < - rowSums（（DF1 <2000）（DF1> 999））而不是那個巨大且不必要的'ifelse'語句？（或'+（!! rowSums（（df1 <2000）＆（df1> 999）））'如果在同一行內有多個列匹配條件） –

我從來沒有使用過它。我想要一個字符串，當TRUE/FALSE矢量全部等於FALSE時等於0，當TRUE/FALSE至少有一個真值時，字符串爲1 TRUE – lukeg

你可以用'any'，即'lapply（yourdf [-1]，函數x）+（any（substr（x，1，1）== 1）））' – akrun

我們可以使用substr來提取字符串的第一個字符。由於有四列以type開頭，所以我們可以使用grep來獲得列的數字索引，我們用lapply循環列，檢查第一個字符是否等於1.如果我們想知道是否至少有一列滿足條件的價值，我們可以用any來包裝它。使用lapply會爲每個list元素返回一個長度爲1的list輸出。由於我們需要一個二進制（0/1）而不是邏輯（FALSE/TRUE），我們可以用+進行換行來強制邏輯到二進制表示。

indx <- grep('^type', colnames(df1)) 
lapply(df1[indx], function(x) +(any(substr(x, 1, 1)==1)))

如果我們需要一個矢量輸出端通過@akrun

vapply(df1[indx], function(x) +(any(substr(x, 1, 1)==1)), 1L)

來源

2015-08-21 08:49:28 akrun

只是你提供的'lapply'解決方案的一個問題。它沿着列搜索數字「1」。但是我想跨行查找是否出現數字'1'。你如何調整呢？ – lukeg

嘗試'+（!! rowSums（'dim < - '（substring（as.matrix（df1 [indx]），1，1）== 1，dim（df1 [indx]））））' – akrun

大和優雅的答案。我對你的問題的第二部分感興趣。具體說明你將如何使用第一部分來創建你提到的新的9列。我不知道我是否缺少某些東西，但不是每次檢查第一個元素是否與1,2,3等相匹配，都可以簡單地捕獲第一個元素。事情是這樣的：

library(dplyr) 
library(tidyr) 


set.seed(123) 

A <- 1:100 
type.a <- rnorm(100, mean=5000, sd=1433) 
type.b <- rnorm(100, mean=5000, sd=1425) 
type.c <- rnorm(100, mean=5000, sd=1125) 
type.d <- rnorm(100, mean=5000, sd=1233) 

df1 <- data.frame(A, type.a, type.b, type.c, type.d) 


    df1 %>% 
    group_by(A) %>% 
    mutate_each(funs(substr(.,1,1))) %>%      # keep first digit 
    ungroup %>% 
    gather(variable, type, -A) %>%       # create combinations of rows and digits 
    select(-variable) %>% 
    mutate(type = paste0("type_",type), 
     value = 1) %>% 
    group_by(A,type) %>%          
    summarise(value = sum(value)) %>%      # count how many times the row belongs to each type 
    ungroup %>% 
    spread(type, value, fill=0) %>%       # create the new columns 
    inner_join(df1, by="A") %>%        # join back initial info 
    select(A, starts_with("type."), starts_with("type_")) # order columns 


#  A type.a type.b type.c type.d type_1 type_2 type_3 type_4 type_5 type_6 type_7 type_8 type_9 
# 1 1 4196.838 3987.671 7473.662 4118.106  0  0  1  2  0  0  1  0  0 
# 2 2 4670.156 5366.059 6476.465 4071.935  0  0  0  2  1  1  0  0  0 
# 3 3 7233.629 4648.464 4701.712 3842.782  0  0  1  2  0  0  1  0  0 
# 4 4 5101.039 4504.752 5611.093 3702.251  0  0  1  1  2  0  0  0  0 
# 5 5 5185.269 3643.944 4533.868 4460.982  0  0  1  2  1  0  0  0  0 
# 6 6 7457.688 4935.835 4464.222 5408.344  0  0  0  2  1  0  1  0  0 
# 7 7 5660.493 3881.511 4112.822 2516.478  0  1  1  1  1  0  0  0  0 
# 8 8 3187.167 2623.183 4331.056 5261.372  0  1  1  1  1  0  0  0  0 
# 9 9 4015.740 4458.177 6857.271 6524.820  0  0  0  2  0  2  0  0  0 
# 10 10 4361.366 6309.570 4939.218 7512.329  0  0  0  2  0  1  1  0  0 
# .. ..  ...  ...  ...  ... ... ... ... ... ... ... ... ... ...

例如，當我們有列A和B開頭：

library(dplyr) 
library(tidyr) 


    set.seed(123) 

    A <- 1:100 
    B <- 101:200 
    type.a <- rnorm(100, mean=5000, sd=1433) 
    type.b <- rnorm(100, mean=5000, sd=1425) 
    type.c <- rnorm(100, mean=5000, sd=1125) 
    type.d <- rnorm(100, mean=5000, sd=1233) 

    df1 <- data.frame(A,B, type.a, type.b, type.c, type.d) 


    # work by grouping on A and B 
df1 %>% 
    group_by(A,B) %>% 
    mutate_each(funs(substr(.,1,1))) %>%     
    ungroup %>% 
    gather(variable, type, -c(A,B)) %>%      
    select(-variable) %>% 
    mutate(type = paste0("type_",type), 
     value = 1) %>% 
    group_by(A,B,type) %>%          
    summarise(value = sum(value)) %>% 
    ungroup %>% 
    spread(type, value, fill=0) %>%      
    inner_join(df1, by=c("A","B")) %>%        
    select(A,B, starts_with("type."), starts_with("type_")) 


#  A B type.a type.b type.c type.d type_1 type_2 type_3 type_4 type_5 type_6 type_7 type_8 type_9 
# 1 1 101 4196.838 3987.671 7473.662 4118.106  0  0  1  2  0  0  1  0  0 
# 2 2 102 4670.156 5366.059 6476.465 4071.935  0  0  0  2  1  1  0  0  0 
# 3 3 103 7233.629 4648.464 4701.712 3842.782  0  0  1  2  0  0  1  0  0 
# 4 4 104 5101.039 4504.752 5611.093 3702.251  0  0  1  1  2  0  0  0  0 
# 5 5 105 5185.269 3643.944 4533.868 4460.982  0  0  1  2  1  0  0  0  0 
# 6 6 106 7457.688 4935.835 4464.222 5408.344  0  0  0  2  1  0  1  0  0 
# 7 7 107 5660.493 3881.511 4112.822 2516.478  0  1  1  1  1  0  0  0  0 
# 8 8 108 3187.167 2623.183 4331.056 5261.372  0  1  1  1  1  0  0  0  0 
# 9 9 109 4015.740 4458.177 6857.271 6524.820  0  0  0  2  0  2  0  0  0 
# 10 10 110 4361.366 6309.570 4939.218 7512.329  0  0  0  2  0  1  1  0  0 
# .. .. ...  ...  ...  ...  ... ... ... ... ... ... ... ... ... ...

然而，在這種情況下，你應該注意到，您有一個每行的值。所以，爲了定義你的行（以一種獨特的方式），B並不是真的需要。因此，您可以準確地工作，像以前那樣（當B是不存在）和剛剛加入B到你的結果：

df1 %>% 
     select(-B) %>% 
     group_by(A) %>% 
     mutate_each(funs(substr(.,1,1))) %>%     
     ungroup %>% 
     gather(variable, type, -A) %>%       
     select(-variable) %>% 
     mutate(type = paste0("type_",type), 
      value = 1) %>% 
     group_by(A,type) %>%          
     summarise(value = sum(value)) %>%   # count how many times the row belongs to each type 
     ungroup %>% 
     spread(type, value, fill=0) %>%       
     inner_join(df1, by="A") %>%        
     mutate(B=B) %>% 
     select(A,B, starts_with("type."), starts_with("type_"))

來源

2015-08-21 09:33:43 AntoniosK

好的，謝謝爲了提供解決方案，但是您可以調整代碼，以便保留具有5個變量的原始df1，但是隨後我們引入新變量type_1：type_9，並且這是以type_x開頭的數字的行總和 – lukeg

我去了！我會更新..... – AntoniosK

超級東西，還有一點。比方說，我們有'B < - 101：200'，因此''df1 < - data.frame（A，B，Type.a，Type.b，Type.c，Type.d）'。你可以調整代碼，使'B'也包含在輸出中 – lukeg

使用字符串字符位置標識創建一個新變量

回答

相關問題