兩個列表上的R substr

我有兩個列表。第一個對每個元素都有字符串。第二個列表有每個元素的數據框。數據框有一列「開始」和另一個「結束」，以及其他信息。兩個列表上的R substr

text<-'this is a long text. its not an email' 
text0<-'another piece of text' 
text1<-'last sentence of nonsense' 
all.text<-list(text,text0,text1) 
features1<-data.frame(start=c(1,3,5,7),end=c(2,5,9,12),type=c('na','person','person','location')) 
features2<-data.frame(start=c(1,3,5,7),end=c(2,5,9,12),type=c('na','person','person','location')) 
features3<-data.frame(start=c(7,8,10,12),end=c(9,9,11,15),type=c('na','person','person','location')) 
all.features<-list(features1,features2, features3)

我希望循環第一個文本元素和第一個數據幀。數據框的列開始和結束可以在substr中使用以提取文本。

對於單個文本元素，我可以使用下面的循環，然後將其添加到特徵數據框。

one.text<-NULL 
for (i in 1:nrow(features1)) one.text[i]<-((substr(text,features1[i,1],features1[i,2]))) 
features1$word<-one.text

但是我找不到使用lapply或嵌套循環的方法。很顯然，如果可能的話，我不想使用循環，因爲我認爲它們效率低下。有些我已經試過的東西：

named.get<-function(text.list,features.list){ 
    named.entities<-substr(text.list,features.list[,1],features.list[,2]) 
} 
all<-sapply(all.text,named.get,all.features)

或者嵌套循環

one.obj<-NULL 
two.obj<-NULL 
for(i in 1:length(all.text)){ 
    for (j in 1:length(all.features)){ 
    one.obj[j]<-list([i]<-((substr(all.text[i],all.features[[i]][j,1],all.features[[i]][j,2])))) 
    } 
}

但是，這也不能工作。我已經閱讀了substr小插件，閱讀了多個stackoverflow問題，似乎無法找到一個前進的方式。

目的是獲得一個功能列表，其中附有提取的術語，就像我爲上面的單個循環所做的一樣。感謝您的幫助。

來源

2016-07-29 user1370741

雙重循環的等價物是使用Map以及作爲參數傳遞的相應列表。然後，您可以利用substring被矢量化來完成最終提取的事實。

Map(function(tex,fea) substring(tex, fea$start, fea$end), all.text, all.features) 
#[[1]] 
#[1] "th"  "is " " is a" "s a lo" 
# 
#[[2]] 
#[1] "an"  "oth" "her p" "r piec" 
# 
#[[3]] 
#[1] "ent" "nt" "en" "ce o"

來源

2016-07-29 05:28:54 thelatemail

對於我自己的好奇心，是否有可能將這些輸出添加到相應的位置？我的意思是我們可以在'all.features [[1]]'上添加'＃[1]「，」「是」「」是一個「lo」，對其他人也是這樣。 – user2100721

@ user2100721 - 肯定 - 類似於'Map（函數（tex，fea）cbind（fea，string = substring（tex，fea $ start，fea $ end）），all.text，all.features' – thelatemail

優秀！謝謝。 – user1370741

兩個列表上的R substr

回答

相關問題