2015-04-23 81 views
0

我有一個數據幀,其中包含多個不同長度的字符變量,我想將每個變量轉換爲列表,每個元素包含每個單詞,並用空格分隔。分割和替換數據框中的字符變量R

說我的數據是這樣的:

char <- c("This is a string of text", "So is this") 
char2 <- c("Text is pretty sweet", "Bet you wish you had text like this") 

df <- data.frame(char, char2) 

# Convert factors to character 
df <- lapply(df, as.character) 

> df 
$char 
[1] "This is a string of text" "So is this"    

$char2 
[1] "Text is pretty sweet"    "Bet you wish you had text like this" 

現在我可以用strsplit()由單個單詞的分割每一列:

df <- transform(df, "char" = strsplit(df[, "char"], " ")) 
> df$char 
[[1]] 
[1] "This" "is"  "a"  "string" "of"  "text" 

[[2]] 
[1] "So" "is" "this" 

我想要做的就是創建一個循環或功能,這將允許我一次爲這兩列執行此操作,例如:

for (i in colnames(df) { 
    df <- transform(df, i = strsplit(df[, i], " ")) 
} 

但是,此p roduces錯誤:

Error in data.frame(list(char = c("This is a string of text", "So is this", : 
    arguments imply differing number of rows: 6, 8 

我也曾嘗試:

splitter <- function(colname) { 
    df <- transform(df, colname = strsplit(df[, colname], " ")) 
} 

分路器(colnames(DF))

還告訴我:

Error in strsplit(df[, colname], " ") : non-character argument 

我很困惑,爲什麼對變換的調用適用於單個列,但不適用於在循環或函數中應用。任何幫助將非常感激!

+1

目前尚不清楚你想在這裏做什麼。爲了將字符串保存爲字符串,只需執行'df < - data.frame(char,char2,stringsAsFactors = FALSE)'。更重要的是,你是否意識到'lapply(df,as.character)'返回一個列表而不是數據框? 'transform'適用於數據框,不在列表中。最後,你期望的結果是什麼?你想要一個'data.frame'作爲'list'?這個問題很混亂。 –

回答

0

我不transform

char <- c("This is a string of text", "So is this") 
char2 <- c("Text is pretty sweet", "Bet you wish you had text like this") 
df <- data.frame(char, char2) 
# Convert factors to character 
df <- lapply(df, as.character) 

所需的輸出,我把

lapply(df, strsplit, split= " ") 

要獲得

$char 
$char[[1]] 
[1] "This" "is"  "a"  "string" "of"  "text" 

$char[[2]] 
[1] "So" "is" "this" 


$char2 
$char2[[1]] 
[1] "Text" "is"  "pretty" "sweet" 

$char2[[2]] 
[1] "Bet" "you" "wish" "you" "had" "text" "like" "this" 

而且亞歷克斯提到:從你的代碼中的第lapply df <- lapply(df, as.character)能通過將df <- data.frame(char, char2)更改爲來消除

+0

你可以簡化爲'lapply(df,strsplit,split =「」)'。另外,不需要'lapply()'來獲取字符;只需使用'df < - data.frame(char,char2,stringsAsFactors = FALSE)'。 –

+0

好主意!我會添加它 –