2017-04-07 32 views
0

我想弄清楚如何識別在同一個數據幀中的另一列上的數據幀的一列中的任何字符串的實例,以取代。在這種情況下,我有論壇帖子,我已經在其中人們引用其他用戶的名稱,我想擺脫這些名稱進行分析,否則他們將被視爲高數量的話。下面是該數據幀的dput:R模式匹配列中行的多個組合以進行替換?

structure(list(uber_name = structure(c(9L, 2L, 1L, 2L, 3L, 10L, 
3L, 9L, 11L), .Label = c("aluber1968", "bigdreamslittlemoney", 
"FuberNYC", "JamesM", "jonnyplastic", "JustDre", "KING D", "klimarov", 
"NycGirl705", "shumacker", "spike69", "theitalian", "Uberman8263", 
"Ez2dj", "Manhmptn", "NYCDriver", "staytune", "UBS", "Ubured", 
"Jme10", "Lennyyellowcab", "Mir", "eagle88", "Ibuys4730", "NoUsername", 
"BathoTrask", "Douglas", "LGC", "Jakeinny098", "Rustyshackelford", 
"shabbyroch", "ubershiza", "drbrkln", "elys123", "bossdriver", 
"HerbyHerb", "Jim1985", "Malik38", "STIDRIVER", "vxlon7", "Waqar", 
"tohunt4me", "DogPound", "SuliB", "AlBrklyn", "John Cunningham", 
"MReeves", "PinkFoot", "alextheboss", "luisannalui", "censoredbytheFCC", 
"KONY", "cieru", "Jorlev", "Smooth954", "marcusguber", "nyc321", 
"Tony from New Jersey", "Vanstaal", "Bkrah", "brunoamat2", "gebbels6", 
"Kevin7889", "uanic", "Uber OG", "UberKilledMyMarriage", "ya mon its me", 
"HunkAWestchester", "Mr Affinito", "ninja warrior", "NoNonsense", 
"notacabdriver", "Notauberhater", "TwoFiddyMile", "bilyvh", "cybertec69", 
"JohnnyBlanco", "SOBE", "ubernyc"), class = "factor"), uber_write = c("I see people post about getting a w", 
"you have 2 choices either you drive", "More than a year ago I didnt drive ", 
"yeah i stopped driving for them for", "Ive been getting some promotions la", 
"FuberNYC saidIve been getting some ", "shumacker saidAnd You feel importan", 
"FuberNYC saidIve been getting some ", "They start coming after few months " 
), uber_date = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L 
), .Label = c("Jan 19, 2017", "Mar 30, 2017", "Jan 23, 2017", 
"Jan 12, 2017", "Jan 9, 2017", "Jan 1, 2017", "Dec 31, 2016", 
"Nov 26, 2016", "Nov 3, 2016", "Dec 22, 2016", "Dec 13, 2016", 
"Dec 2, 2016", "Nov 15, 2016", "Oct 31, 2016", "Oct 20, 2016", 
"Mar 14, 2017", "Sep 1, 2016", "Jul 26, 2016", "Mar 1, 2017", 
"Feb 25, 2017", "Sep 8, 2016", "Sep 9, 2016", "Apr 21, 2015"), class = "factor")), .Names = c("uber_name", 
"uber_write", "uber_date"), class = c("data.table", "data.frame" 
), row.names = c(NA, -9L), .internal.selfref = <pointer: 0x0000000000220788>) 

我用GSUB之前,但我無法弄清楚如何將其應用到該實例。我想在「uber_names」列中選擇任何名稱,並從這些帖子的任何「uber_writes」中刪除這些用戶。

回答

0

你可以讓所有用戶名的載體uber_names在data.table(dt),然後生成一個正則表達式(name1|name2|name3)""替換所有匹配的用戶名,如:

library(data.table) 
uber_names <- dt$uber_name 
dt[, uber_write_filtered := gsub(
    pattern = paste0("(", paste(uber_names, collapse = "|"), ")"), 
    replacement = "", uber_write)] 
+0

這對我很好,謝謝! – LoF10

0

我沒」牛逼能夠重新創建你的數據幀,但這裏有一個很接近:

data <- 
structure(list(uber_name = c("aluber1968", "bigdreamslittlemoney", 
"FuberNYC", "JamesM", "jonnyplastic", "JustDre", "KING D", "klimarov", 
"NycGirl705", "shumacker", "spike69", "theitalian", "Uberman8263", 
"Ez2dj", "Manhmptn", "NYCDriver", "staytune", "UBS", "Ubured", 
"Jme10", "Lennyyellowcab", "Mir", "eagle88", "Ibuys4730", "NoUsername", 
"BathoTrask", "Douglas", "LGC", "Jakeinny098", "Rustyshackelford", 
"shabbyroch", "ubershiza", "drbrkln", "elys123", "bossdriver", 
"HerbyHerb", "Jim1985", "Malik38", "STIDRIVER", "vxlon7", "Waqar", 
"tohunt4me", "DogPound", "SuliB", "AlBrklyn", "John Cunningham", 
"MReeves", "PinkFoot", "alextheboss", "luisannalui", "censoredbytheFCC", 
"KONY", "cieru", "Jorlev", "Smooth954", "marcusguber", "nyc321", 
"Tony from New Jersey", "Vanstaal", "Bkrah", "brunoamat2", "gebbels6", 
"Kevin7889", "uanic", "Uber OG", "UberKilledMyMarriage", "ya mon its me", 
"HunkAWestchester", "Mr Affinito", "ninja warrior", "NoNonsense", 
"notacabdriver", "Notauberhater", "TwoFiddyMile", "bilyvh", "cybertec69", 
"JohnnyBlanco", "SOBE", "ubernyc"), uber_write = c("I see people post about getting a w", 
"you have 2 choices either you drive", "More than a year ago I didnt drive ", 
"yeah i stopped driving for them for", "Ive been getting some promotions la", 
"FuberNYC saidIve been getting some ", "shumacker saidAnd You feel importan", 
"FuberNYC saidIve been getting some ", "They start coming after few months ", 
"I see people post about getting a w", "you have 2 choices either you drive", 
"More than a year ago I didnt drive ", "yeah i stopped driving for them for", 
"Ive been getting some promotions la", "FuberNYC saidIve been getting some ", 
"shumacker saidAnd You feel importan", "FuberNYC saidIve been getting some ", 
"They start coming after few months ", "I see people post about getting a w", 
"you have 2 choices either you drive", "More than a year ago I didnt drive ", 
"yeah i stopped driving for them for", "Ive been getting some promotions la", 
"FuberNYC saidIve been getting some ", "shumacker saidAnd You feel importan", 
"FuberNYC saidIve been getting some ", "They start coming after few months ", 
"I see people post about getting a w", "you have 2 choices either you drive", 
"More than a year ago I didnt drive ", "yeah i stopped driving for them for", 
"Ive been getting some promotions la", "FuberNYC saidIve been getting some ", 
"shumacker saidAnd You feel importan", "FuberNYC saidIve been getting some ", 
"They start coming after few months ", "I see people post about getting a w", 
"you have 2 choices either you drive", "More than a year ago I didnt drive ", 
"yeah i stopped driving for them for", "Ive been getting some promotions la", 
"FuberNYC saidIve been getting some ", "shumacker saidAnd You feel importan", 
"FuberNYC saidIve been getting some ", "They start coming after few months ", 
"I see people post about getting a w", "you have 2 choices either you drive", 
"More than a year ago I didnt drive ", "yeah i stopped driving for them for", 
"Ive been getting some promotions la", "FuberNYC saidIve been getting some ", 
"shumacker saidAnd You feel importan", "FuberNYC saidIve been getting some ", 
"They start coming after few months ", "I see people post about getting a w", 
"you have 2 choices either you drive", "More than a year ago I didnt drive ", 
"yeah i stopped driving for them for", "Ive been getting some promotions la", 
"FuberNYC saidIve been getting some ", "shumacker saidAnd You feel importan", 
"FuberNYC saidIve been getting some ", "They start coming after few months ", 
"I see people post about getting a w", "you have 2 choices either you drive", 
"More than a year ago I didnt drive ", "yeah i stopped driving for them for", 
"Ive been getting some promotions la", "FuberNYC saidIve been getting some ", 
"shumacker saidAnd You feel importan", "FuberNYC saidIve been getting some ", 
"They start coming after few months ", "I see people post about getting a w", 
"you have 2 choices either you drive", "More than a year ago I didnt drive ", 
"yeah i stopped driving for them for", "Ive been getting some promotions la", 
"FuberNYC saidIve been getting some ", "shumacker saidAnd You feel importan" 
)), .Names = c("uber_name", "uber_write"), row.names = c(NA, 
-79L), class = "data.frame") 

而這裏的答案:

paste0(data$uber_name, collapse = "|") -> dont_want 
data$uber_write2 <- gsub(pattern = dont_want, "", data$uber_write)