2015-01-14 65 views
2

在RI有一個函數定義2串之間計算交集:r使用用戶定義的數據幀列函數

containedin <- function(t1,t2){ 
    return length(Reduce(intersect, strsplit(c(t1,t2), "\\s+"))) 
} 

欲包含2個字符串列的數據幀上應用此功能: 數據.selected [C(「關鍵字」,「標題」)]

keywords                    title 
1 Samsung UN48H6350 48" Samsung UN48H6350 48" Full 1080p Smart HDTV 120Hz with Wi-Fi +$50 Visa Gift Card 
2 Samsung UN48H6350 48"  Samsung UN48H6350 48" Full HD Smart LED TV -Bundle- (See Below for Contents) 
3 Samsung UN48H6350 48"  Samsung UN48H6350 48" Class Full HD Smart LED TV -BUNDLE- See below Details 
4 Samsung UN48H6350 48"  Samsung UN48H6350 48" Full HD Smart LED TV With BD-H5100 Blu-ray Disc Player 
5 Samsung UN48H6350 48"     Samsung UN48H6350 48" Smart 1080p Clear Motion Rate 240 LED HDTV 
6 Samsung UN48H6350 48"   Samsung UN48H6350 - 48-Inch Full HD 1080p Smart HDTV 120Hz with Wi-Fi 
7 Samsung UN48H6350 48"    Samsung 6350 Series UN48H6350 48" 1080p HD LED LCD Internet TV NEW 
8 Samsung UN48H6350 48" Samsung Un48h6350af 75" 1080p Led-lcd Tv - 16:9 - Hdtv 1080p - (un75h6350afxza) 
9 Samsung UN48H6350 48"       Samsung UN48H6350 - 48" HD 1080p Smart HDTV 120Hz Bundle 
10 Samsung UN48H6350 48" Samsung UN48H6350 - 48-Inch Full HD 1080p Smart HDTV 120Hz with Wi-Fi, (R#416) 

如何使用應用功能這2列被應用,與將結果返回一個新列?

+0

嘗試'data.selected $ NEWCOL < - 應用(data.selected [C( '關鍵字', '標題') ],1,函數(x)包含在(x [1],x [2]))'中。 – lukeA

+0

謝謝,這是有效的 – user3628777

回答

4

首先,您的return聲明應該會給你一個錯誤。你大概的意思

containedin <- function(t1,t2){ 
    length(Reduce(intersect, strsplit(c(t1,t2), "\\s+"))) 
} 

無論如何,你可以使用mapply解決您的問題。

mapply(containedin, 
     as.character(data.selected[, 'keywords']), 
     as.character(data.selected[, 'title'])) 

as.character是唯一必要的,如果class(data.selected[, 'keywords'])factor(而不是character