我有一箇中央數據框架的信息(df3),我試圖根據從另一個列(df2)提取的數據進行子集和列添加, ,本身來自第三個(df1)的子集。我設法通過搜索幫助和各種功能來達到目的,但我陷入了僵局。我希望你能幫忙。從R數據框中的多列提取數據,然後搜索另一個
首先,在3dfs組成如下:
#df1 - my initial search database
id <- c("id1", "id2", "id3", "id4", "id5", "id6", "id7", "id8")
yesno <- c("Yes", "Yes", "Yes", "Yes", "No", "Yes", "Yes", "No")
city <- c("London", "London", "Paris", "London", "Paris", "New York", "London", "London")
df1 <- cbind(id, yesno, city)
df1 <- as.data.frame(df1)
df1
#df2 - containing the data needed to search df3, but situated across columns
id <- c("id1", "id2", "id3", "id4", "id5", "id6", "id7", "id8")
twitter <- c("@one","", "@three", "@four", "", "", "@seven", "")
email <- c("", "", "", "add4", "add5","", "add7", "")
mail <- c("", "postcode2", "", "","","","","postcode8")
df2 <- cbind(id, twitter, email, mail)
df2 <- as.data.frame(df2)
df2
#df3 - the central df containing the data I wish to extract
comms <- c("@one", "postcode2", "@three", "@four", "add4", "add5", "six" "@seven", "add7", "postcode2")
target <- c("text1", "text2", "text3", "text4.1", "text4.2", "text5", "text6", "text7.1","text7.2", "text8")
df3 <- cbind(comms,target)
df3 <- as.data.frame(df3)
df3
DF1 DF2和之間的共同性在ID列中找到。到目前爲止,我已經能夠過濾df1並提取id,然後我使用它來對df2進行子集化。
df_search <- df1 %>%
filter(yesno == "Yes", city == "London")
df_search_ids <- df_search$id
df2_search <- df2 %>%
filter(id %in% df_search_ids)
df2_search
id twitter email mail
1 id1 @one
2 id2 postcode2
3 id4 @four add4
4 id7 @seven add7
我的問題是:DF2和DF3之間的公共數據通過DF2三個不同的列特(Twitter,電子郵件和郵件)傳播;這些列包含空白單元格和其他無關信息(例如'我不在Twitter上');最後df2中的一些條目(如上面的id4和id7)在df3中有多個條目。
我試圖達到的解決方案是,我想從df2的列twitter,電子郵件和郵件中提取所有實例,基於與從df1提取的id匹配,以便可以應用提取的信息到子集DF3,並最終導致新的DF(target_res),看起來像這樣:
id_res <- c("id1", "id2", "id4", "id4", "id7", "id7")
comms_res <- c("@one", "postcode2", "@four", "add4", "@seven", "add7")
target_res <- c("text1", "text2", "text4.1", "text4.2", "text7.1", "text7.2")
result_df <- cbind(id_res, comms_res, target_res)
result_df <- as.data.frame(result_df)
result_df
id_res comms_res target_res
1 id1 @one text1
2 id2 postcode2 text2
3 id4 @four text4.1
4 id4 add4 text4.2
5 id7 @seven text7.1
6 id7 add7 text7.2
這是一個動作,我將執行次數(基於DF1的不同探索),因此,最好將複製。
我希望這是對問題的明確解釋。
查找df3中的重複項如何?你的df3有兩行'postcode2'。你想要兩個,第一個? – aichao
感謝您的回覆。我希望來自df3的所有實例能夠在comms列中找到與df2中的twitter,email,mail列相匹配的內容。在comms列中有很多重複項,但目標中的實例是唯一的,所以我希望所有這些重複項都是唯一的。 –
我正在玩str_match,但似乎無法讓它工作。 –