2017-09-18 97 views
0

我不認爲我能找到類似版本的這個問題張貼,因爲我覺得這是一個相對獨特的問題,但請指出我在正確的方向,如果我錯了。我用下面的載體,我需要轉換成數據幀的工作:將矢量轉換爲字符串子集中的2列數據框R

myvec = structure(c(1.03, 2.3, -1.2, -0.09, -0.31, -0.51, 3.4, 3, 0.07, 
0.02, 1.05, -0.02, 2.03), .Names = c("Intercept", "DEF-1017", 
"DEF-1025", "DEF-103", "DEF-1043", "DEF-1046", "DEF-1048", "DEF-1076", 
"OFF-1017", "OFF-1025", "OFF-103", "OFF-1046", "OFF-1076")) 

head(myvec) 
Intercept DEF-1017 DEF-1025 DEF-103 DEF-1043 DEF-1046 
1.03  2.30  -1.20  -0.09  -0.31  -0.51 

該載體應該具有進攻(OFF)和防禦(DEF)的係數爲7個不同的用戶(用戶1017,1025,103, 1043,1046,1048,1076),但是對於兩個用戶缺少冒犯性係數。我需要將這個數據轉換成4欄(防守ID,進攻ID,防守係數,進攻係數)的數據框。更具體地講,我希望得到以下數據框中,佔以這種方式爲遺漏值:

mydf = structure(list(DEFID = c("DEF-1017", "DEF-1025", "DEF-103", "DEF-1043", 
"DEF-1046", "DEF-1048", "DEF-1076"), OFFID = c("OFF-1017", "OFF-1025", 
"OFF-103", NA, "OFF-1046", NA, "OFF-1076"), DEFVAL = c(2.3, -1.2, 
-0.09, -0.31, -0.51, 3.4, 3), OFFVAL = c(0.07, 0.02, 1.05, NA, 
-0.02, NA, 2.03)), .Names = c("DEFID", "OFFID", "DEFVAL", "OFFVAL" 
), row.names = c(NA, -7L), class = "data.frame") 

mydf 
    DEFID OFFID DEFVAL OFFVAL 
1 DEF-1017 OFF-1017 2.30 0.07 
2 DEF-1025 OFF-1025 -1.20 0.02 
3 DEF-103 OFF-103 -0.09 1.05 
4 DEF-1043  <NA> -0.31  NA 
5 DEF-1046 OFF-1046 -0.51 -0.02 
6 DEF-1048  <NA> 3.40  NA 
7 DEF-1076 OFF-1076 3.00 2.03 

截距值被丟棄/不列入表中,和其他一切被格式化爲預計會。任何幫助,將不勝感激,謝謝!

回答

0

我使用tidyr包的任務是這樣的:

第一CONVER到一個數據幀格式:

library(tidyverse) 
df <- data_frame(names= names(myvec), 
      values=myvec) 

下一個過濾器出來的截距,並重新排列與tidyr命令:

df %>% filter(names !="Intercept") %>% 
    extract(names, into=c("coeff", "user"), "([[:alnum:]]+)-([[:alnum:]]+)") %>% 
    spread(coeff, values) 
# A tibble: 7 x 3 
    user DEF OFF 
* <chr> <dbl> <dbl> 
1 1017 2.30 0.07 
2 1025 -1.20 0.02 
3 103 -0.09 1.05 
4 1043 -0.31 NA 
5 1046 -0.51 -0.02 
6 1048 3.40 NA 
7 1076 3.00 2.03 

如果你想要的名字等與上面列出的完全一樣,只需稍微處理一下:

df %>% filter(names !="Intercept") %>% 
    extract(names, into=c("coeff", "user"), "([[:alnum:]]+)-([[:alnum:]]+)") %>% 
    spread(coeff, values) %>% 
    mutate(DEFID = paste("DEF", user, sep="-"), 
     OFFID = paste("OFF", user, sep="-")) %>% 
    rename(DEFVAL=DEF, 
     OFFVAL=OFF) %>% 
    select(DEFID, OFFID, DEFVAL, OFFVAL) 
# A tibble: 7 x 4 
    DEFID OFFID DEFVAL OFFVAL 
    <chr> <chr> <dbl> <dbl> 
1 DEF-1017 OFF-1017 2.30 0.07 
2 DEF-1025 OFF-1025 -1.20 0.02 
3 DEF-103 OFF-103 -0.09 1.05 
4 DEF-1043 OFF-1043 -0.31  NA 
5 DEF-1046 OFF-1046 -0.51 -0.02 
6 DEF-1048 OFF-1048 3.40  NA 
7 DEF-1076 OFF-1076 3.00 2.03 
0

這正是你想要的。我使用splitsubstrmerge。我認爲這是實現它的最短途徑,它可以提供您想要的結果。

library(dplyr) 
DF <- tibble::rownames_to_column(data.frame(myvec)) 
DF <- DF[DF$rowname!= "Intercept",] 
dff <- split(DF , f = substr(DF$rowname, 1, 3)) 
dff2 <- dff[[1]]; dff3 <- dff[[2]] 
dff2$ID <- substr(dff2$rowname, 5, nchar(dff2$rowname)) 
dff3$ID <- substr(dff3$rowname, 5, nchar(dff3$rowname)) 
DF2 <- merge(dff2,dff3,by="ID", all = TRUE) 
DF2 <- DF2[,c(2,4,3,5)] 
names(DF2) <- c("DEFID", "OFFID", "DEFVAL", "OFFVAL") 

DF2 

    DEFID  OFFID DEFVAL OFFVAL 
1 DEF-1017 OFF-1017 2.30 0.07 
2 DEF-1025 OFF-1025 -1.20 0.02 
3 DEF-103 OFF-103 -0.09 1.05 
4 DEF-1043  <NA> -0.31  NA 
5 DEF-1046 OFF-1046 -0.51 -0.02 
6 DEF-1048  <NA> 3.40  NA 
7 DEF-1076 OFF-1076 3.00 2.03 
相關問題