根據條件將多行字符串摺疊爲一行。

-2

df <- data.frame(
    text = c("Treatment1: This text is","on two lines","","Treatment2:This text","has","three lines","","Treatment3: This has one") 
       ) 
df 
         text 
1 Treatment1: This text is 
2    on two lines 
3       
4  Treatment2:This text 
5      has 
6    three lines 
7       
8 Treatment3: This has one

我將如何解析這個文本，以使所有的「治療」是他們自己的行與下面的所有文字在同一行？

例如，這是需要的輸出：

text 
1 Treatment1: This text is on two lines 
2 Treatment2: This text has three lines     
3 Treatment3: This has one

誰能推薦一個辦法做到這一點？

來源

2017-10-15 boshek

也許像下面這樣。
首先，數據格式爲dput，最佳格式是在帖子中共享數據集。

df <- 
structure(list(text = c("Treatment1: This text is", "on two lines", 
"", "Treatment2:This text", "has", "three lines", "", "Treatment3: This has one" 
)), .Names = "text", class = "data.frame", row.names = c("1", 
"2", "3", "4", "5", "6", "7", "8"))

現在的base R代碼。

fact <- cumsum(grepl("treatment", df$text, , ignore.case = TRUE)) 
result <- do.call(rbind, lapply(split(df, fact), function(x) 
        trimws(paste(x$text, collapse = " ")))) 
result <- as.data.frame(result) 
names(result) <- "text" 
result 
#         text 
#1 Treatment1: This text is on two lines 
#2 Treatment2:This text has three lines 
#3    Treatment3: This has one

編輯。
正如Rich Scriven在他的評論中指出的那樣，tapply可以大大簡化上面的代碼。（我沒有看到，我有時複雜太多。）

result2 <- data.frame(
    text = tapply(df$text, fact, function(x) trimws(paste(x, collapse = " "))) 
) 

all.equal(result, result2) 
#[1] "Component 「text」: 'current' is not a factor"

來源

2017-10-15 21:54:49

看一看'tapply（）'。它可以代替'do.call（rbind，lapply（split（...），...））' –

@RichScriven謝謝你，回答編輯你的建議。 –

x <- gsub("\\s+Treatment", "*BREAK*Treatment", 
      as.character(paste(df[[1]], collapse = " "))) 
data.frame(text = unlist(strsplit(x, "\\*BREAK\\*")))

來源

2017-10-15 21:56:54

根據條件將多行字符串摺疊爲一行。

回答

相關問題