下面的代碼可以做得更像「R like」嗎?該代碼如何壓縮?
鑑於data.frame INDF:
V1 V2 V3 V4
1 a ha 1;2;3 A
2 c hb 4 B
3 d hc 5;6 C
4 f hd 7 D
裏面DF我想
- 找到所有行這爲 「V3」 列有由分隔的多個值 「;」
- 然後複製各行的次數相等的單獨的值中的「V3」列中的數字,
- 然後每個複製的行中的「V3」列僅接收一個初始值
不久,輸出data.frame(= outDF)看起來像:
V1 V2 V3 V4
1 a ha 1 A
1 a ha 2 A
1 a ha 3 A
2 c hb 4 B
3 d hc 5 C
3 d hc 6 C
4 f hd 7 D
所以,如果從INDF我想要去outDF,我會寫下面的代碼:
#load inDF from csv file
inDF <- read.csv(file='example.csv', header=FALSE, sep=",", fill=TRUE)
#search in inDF, on the V3 column, all the cells with multiple values
rowlist <- grep(";", inDF[,3])
# create empty data.frame and add headers from "headDF"
xDF <- data.frame(matrix(0, nrow=0, ncol=4))
colnames(xDF)=colnames(inDF)
#take every row from the inDF data.frame which has multiple values in col3 and break it in several rows with only one value
for(i in rowlist[])
{
#count the number of individual values in one cell
value_nr <- str_count(inDF[i,3], ";"); value_nr <- value_nr+1
# replicate each row a number of times equal with its value number, and transform it to character
extracted_inDF <- inDF[rep(i, times=value_nr[]),]
extracted_inDF <- data.frame(lapply(extracted_inDF, as.character), stringsAsFactors=FALSE)
# split the values in V3 cell in individual values, place them in a list
value_ls <- str_split(inDF[i, 3], ";")
#initialize f, to use it later to increment both row number and element in the list of values
f = 1
# replace the multiple values with individual values
for(j in extracted_inDF[,3])
{
extracted_inDF[f,3] <- value_ls[[1]][as.integer(f)]
f <- f+1
}
#put all the "demultiplied" rows in xDF
xDF <- merge(extracted_inDF[], xDF[], all=TRUE)
}
# delete the rows with multiple values from the inDF
inDF <- inDF[-rowlist[],]
#create outDF
outDF <- merge(inDF, xDF, all=TRUE)
您能否請
從正確編碼規則:永遠不要重新發明輪子。浪費時間,你很可能犯錯誤,或至少拿出一個非最佳解決方案。 –
我同意不重新發明輪子。我不確定我是否確切知道如何使用車輪。對於上面的例子,你有任何關於正確使用R的建議嗎? – CLM
看看str_split和strsplit。一般來說,基R有一些有用的字符串函數,'stringr'包有更多。 –