tidyr只分開前n個實例

我在R中有一個data.frame，爲了簡單起見，它有一列，我想分開。它看起來像這樣：tidyr只分開前n個實例

V1 
Value_is_the_best_one 
This_is_the_prettiest_thing_I've_ever_seen 
Here_is_the_next_example_of_what_I_want

我真正數據量非常大（數百萬行的），所以我想用tidyr的單獨的函數（因爲它是驚人的快）分離出開頭的幾個實例。我希望得到的結果是以下幾點：

V1  V2  V3  V4 
Value is  the best_one 
This  is  the prettiest_thing_I've_ever_seen 
Here  is  the next_example_of_what_I_want

正如你所看到的，分隔符是_的V4列可以有分隔的不同的數字。我想保持V4（不放棄它），但不必擔心有多少東西在那裏。總是會有四列（即我的行沒有隻有V1-V3）。

這裏是我的出發tidyr命令我一直使用：

separate(df, V1, c("V1", "V2", "V3", "V4"), sep="_")

這擺脫V4的（和吐出的警告，這是不是最大的交易）。

來源

2016-05-09 Gaius Augustus

你只需要'額外=「合併」分割的數量的另一種選擇'？ – aosmith

@aosmith是的，謝謝。我讀了10次文檔，不知怎的，誤解了！請把它寫成答案！ –

您需要extra參數與"merge"選項。這隻允許您定義新列的分割數量。

separate(df, V1, c("V1", "V2", "V3", "V4"), extra = "merge") 

    V1 V2 V3        V4 
1 Value is the      best_one 
2 This is the prettiest_thing_I've_ever_seen 
3 Here is the next_example_of_what_I_want

來源

2016-05-09 23:09:07 aosmith

如果您想以其他方式合併，該怎麼辦？例如，假設你有「John Q Public」。我想把它分成兩個字符串：「John Q」和「Public」。有沒有一種簡單的方式來做到這一點，除了手動分割和子集？ –

@DavidBruceBorenstein這聽起來像你需要設置'sep'參數，所以你只能分割最後一個空間。 – aosmith

這裏是extract

library(tidyr) 
extract(df1, V1, into = paste0("V", 1:4), "([^_]+)_([^_]+)_([^_]+)_(.*)") 
#  V1 V2 V3        V4 
# 1 Value is the      best_one 
# 2 This is the prettiest_thing_I've_ever_seen 
# 3 Here is the next_example_of_what_I_want

另一種選擇是從library(stringi)在這裏我們可以指定

library(stringi) 
do.call(rbind, stri_split(df1$V1, fixed="_", n=4))

來源

2016-05-10 03:42:49 akrun

tidyr只分開前n個實例

回答

相關問題