我想找出最有效的方法來匹配字符串的兩個向量到第三個字符串。我想從第一場比賽限制我的第二場比賽,以文字或字符的數量有限,遠字符串匹配兩個向量到文本的向量由兩者之間的距離限制
可以說我有名字像這樣的datframe:
signers <- data.frame(
first =
c("Benjamin","Thomas","Robert","George","Thomas","Jared","James","John","James","George","George","James","Edmund","George") ,
last =
c("Franklin","Mifflin","Morris","Clymer","Fitzsimons","Ingersoll","Wilson","Blair","Madison","Washington","Mason","McClurg","Randolph","Wythe")
)
,我有一些像這樣的文字:
text <-
"A lot of people attended the Constitutional Convention in Philadephia, including Alexander Hamilton, Benjamin Franklin and John Adams.
Not everyone who attended the convention ended up signing the Constitution, including George Wythe, John F. Mercer and Edmund Jennings Randolph who abstained."
我想搜索「簽署者」數據框中的每個名稱並標記它們是否在文本中。
在本傑明富蘭克林和喬治Wythe的情況下,名稱完全在文本中。在Edmund Randolph的情況下,他的名字和姓氏之間有一個字或10個字符。
所以我要尋找的是這樣的:
first last inparagraph
1 Benjamin Franklin 1
2 Thomas Mifflin
3 Robert Morris
4 George Clymer
5 Thomas Fitzsimons
6 Jared Ingersoll
7 James Wilson
8 John Blair
9 James Madison
10 George Washington
11 George Mason
12 James McClurg
13 Edmund Randolph 1
14 George Wythe 1
我雖然使用lappy
功能查找第一個名稱位於,但我不能確定如何在第一名稱的鄰近範圍內搜索被找出。
namesfinds <- lapply(signers$first , grep, text)
我知道它已經兩年了 - 但我非常感謝這個答案! – MatthewR
@MatthewR,沒有問題,我很欣賞這種讚賞;) – BrodieG