我有非常大的數據集。我必須在我的數據集中進行一些預處理。我在我的數據集中執行以下步驟,但是我得到第二列名稱的數字。但是當我在簡單數據集上運行代碼時,它運行良好。有誰知道問題是什麼?以及如何從輸出中刪除""
?爲什麼我在R中處理完數據後得到的是數字而不是名字?
我的數據集的某些部分:
> tars.hsa.miRBase[1:4,]
miRBaseid
1 hsa-let-7a/hsa-let-7b/hsa-let-7c/hsa-let-7d/hsa-let-7e/hsa-let-7f/hsa-miR-98/hsa-let-7g/hsa-let-7i/hsa-miR-4458/hsa-miR-4500
2 hsa-let-7a/hsa-let-7b/hsa-let-7c/hsa-let-7d/hsa-let-7e/hsa-let-7f/hsa-miR-98/hsa-let-7g/hsa-let-7i/hsa-miR-4458/hsa-miR-4500
3 hsa-let-7a/hsa-let-7b/hsa-let-7c/hsa-let-7d/hsa-let-7e/hsa-let-7f/hsa-miR-98/hsa-let-7g/hsa-let-7i/hsa-miR-4458/hsa-miR-4500
4 hsa-let-7a/hsa-let-7b/hsa-let-7c/hsa-let-7d/hsa-let-7e/hsa-let-7f/hsa-miR-98/hsa-let-7g/hsa-let-7i/hsa-miR-4458/hsa-miR-4500
Gene.Symbol Transcript.ID
1 SCARA3 NM_016240
2 IGLON5 NM_001101372
3 IRF5 NM_001098630
4 ELL2 NM_012081
我的代碼:
ind.mirs <- strsplit(tars.hsa.miRBase[, "miRBaseid"], split="/")
lclus <- (sapply(ind.mirs, length))
new.tars <- matrix(NA,sum(lclus),2)
new.tars[,1] <- do.call(c,ind.mirs)
new.tars[,2] <- rep(tars.hsa.miRBase$Gene.Symbol, time=lclus)
輸出的某些部分:
[,1] [,2]
[1,] "hsa-let-7a" "13883"
[2,] "hsa-let-7b" "13883"
[3,] "hsa-let-7c" "13883"
[4,] "hsa-let-7d" "13883"
我期待什麼:
miRBaseid Gene.Symbol
[1,] hsa-let-7a SCARA3
[2,] hsa-let-7b SCARA3
[3,] hsa-let-7c SCARA3
[4,] hsa-let-7d SCARA3
.
.
.
.
它是如何在簡單的數據工作:
tars.hsa <- data.frame(miR.Family=c("a","b/b","c/c","d/d/d"), Gene.Symbol=paste0("A",1:4,"BG"),stringsAsFactors=FALSE)
ind.mirs <- strsplit(tars.hsa[, "miR.Family"], split="/")
lclus <- sapply(ind.mirs, length)
new.tars <- matrix(NA,sum(lclus),2)
new.tars[,1] <- do.call(c,ind.mirs)
new.tars[,2] <- rep(tars.hsa$Gene.Symbol, time=lclus)
輸出:
[,1] [,2]
[1,] "a" "A1BG"
[2,] "b" "A2BG"
[3,] "b" "A2BG"
[4,] "c" "A3BG"
[5,] "c" "A3BG"
[6,] "d" "A4BG"
[7,] "d" "A4BG"
[8,] "d" "A4BG"
>
當我使用數據框時,出現錯誤和警告。你能否指定你的解決方案? – user2806363
我會看看我是否可以舉一個例子。 – TARehman
這有意義嗎? – TARehman