正則表達式模式 - 獲取特定單詞前的數字-gsub

我剛開始學習正則表達式並陷入一個問題。我收到了一個包含電影獎項信息的數據集。正則表達式模式 - 獲取特定單詞前的數字-gsub

**Award** 
    Won 2 Oscars. Another 7 wins & 37 nominations. 
    6 wins& 30 nominations 
    5 wins 
    Nominated for 1 BAFTA Film Award. Another 1 win & 3 nominations.

我想拉出「勝利」和「提名」之前的數字，併爲每個添加兩列。例如，對於第一個，這將是6勝列和37列提名

我使用的模式是

df2$nomination <- gsub(".*win[s]?|[[:punct:]]? | nomination.*", "",df2$Awards)

都不盡如人意。我不知道如何編寫「勝利」模式。 :( 任何人都可以請幫助？

非常感謝！

來源

2017-10-14 Jianan He

對不起，第一個對於win列將是7。 –

我們可以提取數字的list，然後填充NAS進行情況後rbind那裏只有一個單一的元素

lst <- regmatches(df2$Award, gregexpr("\\d+(?= \\b(wins?|nominations)\\b)", 
       df2$Award, perl = TRUE)) 
df2[c('new1', 'new2')] <- do.call(rbind, lapply(lapply(lst, `length<-`, 
          max(lengths(lst))), as.numeric)) 
df2 
#                Award new1 new2 
#1     Won 2 Oscars. Another 7 wins & 37 nominations. 7 37 
#2           6 wins& 30 nominations 6 30 
#3               5 wins 5 NA 
#4 Nominated for 1 BAFTA Film Award. Another 1 win & 3 nominations. 1 3

來源

2017-10-14 04:52:16 akrun

我們可以使用str_extract以正則表達式得到值

library(stringr) 
text <- c("Won 2 Oscars. Another 7 wins & 37 nominations.", 
      "6 wins& 30 nominations", 
      "5 wins", 
      "Nominated for 1 BAFTA Film Award. Another 1 win & 3 nominations.") 
df <- data.frame(text = text) 

df$value1 <- str_extract(string = df$text, "\\d+\\b(?=\\swin)") 
df$value2 <- str_extract(string = df$text, "\\d+\\b(?=\\snomination)") 

> df 
                   text value1 value2 
1     Won 2 Oscars. Another 7 wins & 37 nominations.  7  37 
2           6 wins& 30 nominations  6  30 
3               5 wins  5 <NA> 
4 Nominated for 1 BAFTA Film Award. Another 1 win & 3 nominations.  1  3

來源

2017-10-14 05:39:25

正則表達式模式 - 獲取特定單詞前的數字-gsub

回答

相關問題