我有一個文本變量和一個分組變量。我想將文本變量摺疊爲每行一個字符串(合併)。所以只要小組專欄說m
我想將文本分組在一起等等。我在前後提供了一個樣本數據集。我正在編寫這個包,並且迄今爲止避免了對除wordcloud
之外的其他包的所有依賴,並且希望以此方式保留它。通過分組變量摺疊列(以基數爲單位)
我懷疑rle
可能對cumsum
很有用,但一直沒能弄清楚這一點。
預先感謝您。
什麼數據看起來像
text group
1 Computer is fun. Not too fun. m
2 No its not, its dumb. m
3 How can we be certain? f
4 There is no way. m
5 I distrust you. m
6 What are you talking about? f
7 Shall we move on? Good then. f
8 Im hungry. Lets eat. You already? m
我想要什麼數據看起來像
text group
1 Computer is fun. Not too fun. No its not, its dumb. m
2 How can we be certain? f
3 There is no way. I distrust you. m
4 What are you talking about? Shall we move on? Good then. f
5 Im hungry. Lets eat. You already? m
數據
dat <- structure(list(text = c("Computer is fun. Not too fun.", "No its not, its dumb.",
"How can we be certain?", "There is no way.", "I distrust you.",
"What are you talking about?", "Shall we move on? Good then.",
"Im hungry. Lets eat. You already?"), group = structure(c(2L,
2L, 1L, 2L, 2L, 1L, 1L, 2L), .Label = c("f", "m"), class = "factor")), .Names = c("text",
"group"), row.names = c(NA, 8L), class = "data.frame")
編輯:我發現我可以用於與該組變量的每個運行添加獨特的列:
x <- rle(as.character(dat$group))[[1]]
dat$new <- as.factor(rep(1:length(x), x))
產量:
text group new
1 Computer is fun. Not too fun. m 1
2 No its not, its dumb. m 1
3 How can we be certain? f 2
4 There is no way. m 3
5 I distrust you. m 3
6 What are you talking about? f 4
7 Shall we move on? Good then. f 4
8 Im hungry. Lets eat. You already? m 5
我不相信你需要「以次(長度(k $ len))「,因爲序列會將」seq_along「作爲k $長度向量,給出相應的數字序列:id < - rep(seq(k $ length),k $ length) – 2012-03-25 05:04:28
@BryanGoodrich Good catch 。本來我只是打算做1:長度(k $ len),但最近我一直在更多地使用seq和seq_along,並且我想最終會導致兩種方法的混淆。 – Dason 2012-03-25 05:28:35
我通常只是堅持seq,但爲了清晰起見,我可以看到seq_along如何明確表示您正在數值遍歷值的向量。當我處理使用x [[(某些邏輯在這裏...)]的布爾向量上的多餘時,我經常傾向於走這條清晰的路線。這不是必要的,但它確實給了我更喜歡的編碼的語言清晰度。 – 2012-03-26 07:16:24