2014-12-03 35 views
0

非常感謝您閱讀本問題。 我正試圖在R中編寫一個通用的多目的函數來處理特定結構中的文本。 讓我描述我想用一個例子(我想寫到這裏ReadDB功能)的例子的發明文本語法

ReadDB <- function(query, ...){ 
... 
} 
text = "I'm Mahdi; {[What's your name?] Nice to see you <name>.}" 

ReadDB(query = text, name = "Mark") 
# output is : I'm Mahdi; Nice to see you Mark. 

ReadDB(query = text) 
# output is : I'm Mahdi; What's your name? 

ReadDB(query = text, Age = 22) 
# warning is : Age Argument is not used! 
# output is : I'm Mahdi; What's your name? 

描述:

  1. 在文本中的每個塊由{}包圍。
  2. 每個參數在兩側都標識爲<>
  3. 如果塊中定義的參數是由用戶爲該函數確定的,則[]之間的短語將在塊內移除,並且<arg>將被替換爲確定的值。否則除了[]以外的所有內容都將被刪除。

更復雜的例子是當一個塊內有另一個塊時,我們期望在這個等式中做與()算子相同的優先級。

更新:

我增加了更多的複雜的形勢與嵌套{{,如下:

text = "I'm Mahdi; {[What's your name?] Nice to see you <name>.{I live in <city>.}}" 
ReadDB(query = text, name = "Mark") 
# output is : I'm Mahdi; Nice to see you Mark. 

ReadDB(query = text) 
# output is : I'm Mahdi; What's your name? 

ReadDB(query = text, city = "St. Louis", name="Mahdi") 
# output is : I'm Mahdi; Nice to see you Mark. I live in St. Louis. 

ReadDB(query = text, city = "St. Louis") 
# output is : I'm Mahdi; What's your name? 

注意,當默認未裏面{}給它是空的。因此text = {[]I live in <city>.}text = {I live in <city>.}相同。

+1

'sprintf'是用在這裏。使用'%s'子字符串。 – 2014-12-03 00:20:41

回答

3

我不清楚描述中沒有更多的說明示例 ,但這適用於顯示的示例。它將默認字符串 提取到default中,然後刪除[和]以及[和]之間的{和}以及所有內容。 然後它提取查詢中的名稱並確定哪些參數 名稱未被使用。對於那些它發出警告。然後它確定 查詢中的哪些名稱未被替換,並且如果有的話 返回帶有{和其後的所有內容被替換爲default; 否則,它會返回query,並替換名稱。

library(gsubfn) 

ReadDB <- function(query, ...) { 

    L <- list(...) 
    default <- strapplyc(query, "\\[(.*)\\]", simplify = TRUE) 

    query2 <- gsub("[{}]", "", query) 
    query3 <- gsub("\\[[^]]*\\]", "", query2) 

    pat <- "\\<([^>]*)\\>" 
    names_in_query <- strapplyc(query3, pat)[[1]] 

    args_not_used <- setdiff(names(L), names_in_query) 
    for(nm in args_not_used) warning(nm, " not used\n") 

    names_not_substituted <- setdiff(names_in_query, names(L)) 
    if (length(names_not_substituted)) sub("\\{.*", default, query) 
    else gsubfn(pattern = pat, L, x = query3) 
} 

,並提供:

> ReadDB(text) 
[1] "I'm Mahdi; What's your name?" 
> ReadDB(query = text, name = "Mark") 
[1] "I'm Mahdi; Nice to see you Mark." 
> ReadDB(query = text, Age = 22) 
[1] "I'm Mahdi; What's your name?" 
Warning message: 
In ReadDB(query = text, Age = 22) : Age not used 

的SO不是爲海報寫代碼的目的。它回答編程問題,所以請在下次提供您的代碼時,如果時間太長,那麼問題就不合適,需要縮小尺寸。

+0

這是非常聰明的Perl語法的使用。我非常喜歡。 我們是否可以改變處理嵌套'{{}}'的代碼? 例如:'text =「我是Mahdi; {[你叫什麼名字?]很高興見到你。{我住在。}}」' – 2014-12-03 01:34:56

0

首先讓我感謝G.格羅騰迪克的聰明的答案。 雖然,他的答案仍然無法處理嵌套{{我決定發佈我的實現這個問題。希望其他人可以使用它,以及:

ReadDB <- function(query, ...) { 
arg = list(...) 
query.string = query 
for (query.arg in names(arg)){ 
    query.arg_ = paste("<",query.arg,">",sep="") 
    if (grepl(query.arg_,query.string, ignore.case = TRUE)){ 
    query.string = gsub(query.arg_, arg[[query.arg]] , query.string) 
    }else{ 
    warning(paste(query.arg, " argument is not filtered in query!",sep="")) 
    } 
} # replace given arguments in the text 

find_period = function(x){ 
    ch =c("\\{","\\}","\\[","\\]","<",">") 
    A = lapply(ch, function(ch){unlist(ifelse(grepl(ch,x,perl = F),gregexpr(ch, x,perl = F),NA))}) 
    ind = 2; 
    while (ind<= length(A)){ 
    tmp = NULL 
    for (xind in A[[ind]]) 
     tmp = c(tmp,max(setdiff(A[[ind - 1]][A[[ind-1]]<xind], tmp))) 
    A[[ind - 1]] = tmp; 
    ind = ind + 2; 
    } 
    names(A)<-ch 
    return(A) 
} 
p = find_period(query.string) 

while (!is.na(p[[1]][1]+p[[2]][1])){ 
    Block.text = substr(x = query.string,p[[1]][1]+1,p[[2]][1]-1) 
    p2 = find_period(Block.text) 
    if (!is.na(p2[[5]])){ 
    Block.text = ifelse(is.na(p2[[3]]),"",substr(Block.text,p2[[3]][1]+1,p2[[4]][1]-1)) 
    }else{ 
    Block.text = gsub(pattern = "\\[.*\\]",replacement = "",x = Block.text) 
    } 
    query.string = paste(ifelse(p[[1]][1]==1,"",substr(x = query.string,1,p[[1]][1]-1)),  Block.text, 
         ifelse(p[[2]][1]==nchar(query.string),"",substr(x = query.string,p[[2]][1]+1,nchar(query.string))),sep="") 
    p = find_period(query.string) 
} 

query.string = gsub(pattern = " {2,}",replacement = " ", x = query.string) # remove double space 
return(query.string) 
} 

下面是測試:

> text = "I'm Mahdi; {[What's your name?] Nice to see you <name>.{I live in <city>.}}" 
> ReadDB(query = text, city = "St. Louis", name="Mike") 
[1] "I'm Mahdi; Nice to see you Mike.I live in St. Louis." 

> ReadDB(query = text, city = "St. Louis") 
[1] "I'm Mahdi; What's your name?" 

> ReadDB(query = text, name="Mike") 
[1] "I'm Mahdi; Nice to see you Mike." 

> ReadDB(query = text, name="Mahdi", Age = 22) 
[1] "I'm Mahdi; Nice to see you Mahdi." 
Warning message: 
In ReadDB(query = text, name = "Mahdi", Age = 22) : 
    Age argument is not filtered in query!