2017-10-12 174 views
0

我在一個數據框中有兩列字符串,並且對於每一行我都想看到不同的字符。兩個字符串之間不同的行提取字符

E.g給出

Lines <- " 
a  b 
cat car 
dog ding 
cow haw" 
df <- read.table(text = Lines, header = TRUE, as.is = TRUE) 

回報

a  b  diff 
cat car t 
dog ding o 
cow haw co 

我見過

Extract characters that differ between two strings

以及

Split comma-separated column into separate rows

其中返回一些整齊的解決方案,這將工作的各行(第一參考),或充當排聰明但不正是我想要的(第二參考)。

理想我想使用這樣的事情:

Reduce(setdiff, strsplit(c(a, b), split = "")) 

我想:

apply(df, function(a,b) Reduce(setdiff, strsplit(c(a, b), split = ""))) 

,但無濟於事。

這怎麼辦?

p.s.我特別希望如果有可能做到這一點使用dplyr,但僅限於在年底的注意重複性顯示格式上的原因

+1

你舉的例子是不可重現。請考慮使用'dput'。例如,我們會查看您的列中是否實際存在字符向量或因素,這是造成混淆的常見原因。 – lmo

回答

2

假設df定義一個函數Diff它接受字符串兩個vecdors,運行setdiff他們和粘貼結果在一起,然後使用mapply在將它們分解爲單個字符之後在兩列上運行。

Diff <- function(x, y) paste(setdiff(x, y), collapse = "") 
transform(df, diff = mapply(Diff, strsplit(a, ""), strsplit(b, ""))) 

,並提供:

a b diff 
1 cat car t 
2 dog ding o 
3 cow haw co 

注:上面所用的輸入df是:

Lines <- " 
a  b 
cat car 
dog ding 
cow haw" 
df <- read.table(text = Lines, header = TRUE, as.is = TRUE) 
0

下面是使用Map另一個基R法。

diffList <- Map(setdiff, strsplit(dat[[1]], ""), strsplit(dat[[2]], "")) 
diffList 
[[1]] 
[1] "t" 

[[2]] 
[1] "o" 

[[3]] 
[1] "c" "o" 

您可以將其包裝在sapply中爲您的數據返回一個字符向量。幀:

dat$charDiffs <-sapply(diffList, paste, collapse="") 

返回

dat 
    a b charDiffs 
1 cat car   t 
2 dog ding   o 
3 cow haw  co 

數據(從dput

dat <- 
structure(list(a = c("cat", "dog", "cow"), b = c("car", "ding", 
"haw")), .Names = c("a", "b"), row.names = c(NA, -3L), class = "data.frame") 
1

tidyversestringr溶液。

library(tidyverse) 
library(stringr) 

dt2 <- dt %>% 
    mutate(a_list = str_split(a, pattern = ""), b_list = str_split(b, pattern = "")) %>% 
    mutate(diff = map2(a_list, b_list, setdiff)) %>% 
    mutate(diff = map_chr(diff, ~paste(., collapse = ""))) %>% 
    select_if(~!is.list(.)) 
dt2 
# A tibble: 3 x 3 
     a  b diff 
    <chr> <chr> <chr> 
1 cat car  t 
2 dog ding  o 
3 cow haw co 

DATA

dt <- read.table(text = "a  b 
cat car 
       dog ding 
       cow haw", 
       header = TRUE, stringsAsFactors = FALSE) 
1

使用dplyr

library(dplyr) 
ff = data.frame(a = c("dog","chair","love"),b = c("dot","liar","over"),stringsAsFactors = F) 
st = ff %>% mutate(diff = sapply(Map(setdiff,strsplit(a,""),strsplit(b,"")),paste,collapse = "")) 

> st 
     a b diff 
1 dog dot g 
2 chair liar ch 
3 love over l 
相關問題