我有一個項目列表和搜索字詞的列表,我試圖做兩件事情:返回原來的搜尋字詞grep的R中
- 搜索通過項目的匹配任何搜索條款,並且如果找到匹配則返回true 。
- 對於那些返回true(即,有一個匹配)的所有項目,我想 也返回其在步驟1相匹配
所以原來的搜索詞,給出下面的數據幀:
items
1 alex
2 alex is a person
3 this is a test
4 false
5 this is cathy
和下面的搜索字詞列表:
"alex" "bob" "cathy" "derrick" "erica" "ferdinand"
我想創建以下的輸出:
items matches original
1 alex TRUE alex
2 alex is a person TRUE alex
3 this is a test FALSE <NA>
4 false FALSE <NA>
5 this is cathy TRUE cathy
步驟1非常簡單,但我在步驟(2)中遇到了問題。要創建「匹配」列,我使用grepl()
創建一個變量,如果d$items
中的某行在搜索項列表中,則該變量的值爲TRUE
;否則,使用FALSE
。
對於第2步,我的想法是,我應該能夠使用grep()
,同時指定value = T
,如下面的代碼所示。但是,這會返回錯誤的值:而不是返回與grep匹配的原始搜索詞,它會返回匹配項的值。所以我得到以下輸出:
items matches original
1 alex TRUE alex
2 alex is a person TRUE alex is a person
3 this is a test FALSE <NA>
4 false FALSE <NA>
5 this is cathy TRUE this is cathy
這是我現在使用的代碼。任何想法將不勝感激!
# Dummy data and search terms
d = data.frame(items = c("alex", "alex is a person", "this is a test", "false", "this is cathy"))
searchTerms = c("alex", "bob", "cathy", "derrick", "erica", "ferdinand")
# Return true iff search term is found in items column, not between letters
d$matches = grepl(paste("(^| |[^abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVQXYZ])",
searchTerms, "($| |[^abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVQXYZ])", sep = "",
collapse = "|"), d[,1], ignore.case = TRUE
)
# Subset data
dMatched = d[d$matches==T,]
# This is where the problem is: return the value that was originally matched with grepl above
dMatched$original = grep(paste("(^| |[^abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVQXYZ])",
searchTerms, "($| |[^abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVQXYZ])", sep = "",
collapse = "|"), dMatched[,1], ignore.case = TRUE, value = TRUE
)
d$original[d$matches==T] = dMatched$original
您可以替換字母的長字符串' [:阿爾法:]'。 – Thomas 2013-05-09 19:08:52
你可能想看看'regmatches'函數。 – Dason 2013-05-09 19:09:54
@Thomas:謝謝你的提示。不過,[:alpha:]和其他預定義的字符類似乎對我而言似乎不起作用。它必須與我的區域設置有關。從字符類的正則表達式文檔:「(因爲它們的解釋是語言環境和實現相關的,所以最好避免它們。)指定所有ASCII字母的唯一便攜方法是將它們全部列爲字符類別 [ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz]。「 – Steve 2013-05-09 19:15:11