2017-05-05 40 views
2

想通過簡化代碼來問這個問題(邏輯有點不可思議 - 但它與我的情況很相似),我正在使用的當前代碼很長,可能太多沒有價值的詞。我會很樂意添加什麼需要回答這個問題:將列表中的循環輸出轉換爲R中的數據框

我有一個情況與for循環,例如:

data2 <- data.frame("Chocolate" = c(0.25), "Strawberry" = c(.16), 
       "Vanilla" = c(0.64), "Blueberry" = c(.75)) 

for (i in 1:4) { 
    freqSim <- data.frame(sample(0:1, length(1:100), replace=T, prob = c(1-data2[i],data2[i]))) 

    lossCol <- freqSim*(runif(n=100, min=0, max=7000)) 

    costAvg <- mean(as.numeric(unlist(lossCol))) 
    costSD <- sd(as.numeric(unlist(lossCol))) 

    costAvg <- formatC(costAvg, format='d', big.mark=",") 
    costSD <- formatC(costSD, format='d', big.mark= ",") 

    stats <- list() 
    stats[[i]] <- list(costAvg,costSD) 

    print(stats[[i]]) 
} 

在那裏我得到返回的載體,諸如:

[[1]] 
[1] "1,261" 

[[2]] 
[1] "2,103" 

[[1]] 
[1] "313" 

[[2]] 
[1] "1,165" 

[[1]] 
[1] "2,073" 

[[2]] 
[1] "2,206" 

[[1]] 
[1] "2,417" 

[[2]] 
[1] "2,258" 

我會理想像,看起來像一個矩陣:

  Chocolate Strawberry Vanilla Blueberry 
Label A  1,261  313   2,073  2,417 
Label B  2,103  1,165  2,206  2,258 

沒有辦法做到這一點沒有拋出自己掉下懸崖?

回答

1

這裏有一個簡單的解決辦法:

data2 <- data.frame("Chocolate" = c(0.25), "Strawberry" = c(.16), 
     "Vanilla" = c(0.64), "Blueberry" = c(.75)) 

stats <- data.frame(row.names = c("Label A", "Label B")) 

for (i in 1:4) { 
    freqSim <- data.frame(sample(0:1, length(1:100), replace=T, 
      prob = c(1-data2[i],data2[i]))) 

    lossCol <- freqSim*(runif(n=100, min=0, max=7000)) 

    costAvg <- mean(as.numeric(unlist(lossCol))) 
    costSD <- sd(as.numeric(unlist(lossCol))) 

    costAvg <- formatC(costAvg, format='d', big.mark=",") 
    costSD <- formatC(costSD, format='d', big.mark= ",") 

    stats["Label A", i] <- costAvg 
    stats["Label B", i] <- costSD 
} 

colnames(stats) <- colnames(data2) 

結果:

 Chocolate Strawberry Vanilla Blueberry 
Label A  764  470 2,003  2,932 
Label B  1,674  1,418 2,202  2,315 

我會鼓勵你看看使用tidyr爲這些類型的操作,而不是在基礎R做, 如果你可以的話。

1

我們可以通過使用simplify2array

res <- simplify2array(stats) 
dimnames(res) <- list(paste("Label", c("A", "B")), names(data2)) 

注意做到這一點:確保定義

stats <- list() 

for

一個更好的選擇是指定外面有length預 - 分配即

stats <- vector("list", length(data2)) 
1

爲了準確地得到你作爲輸出表的內容,試試這個。沒有時間應用正確的命名約定。請包涵。

data2 <- data.frame("Chocolate" = c(0.25), "Strawberry" = c(.16), 
        "Vanilla" = c(0.64), "Blueberry" = c(.75)) 
x = c("Chocolate", "Strawberry", "Vanilla", "Blueberry") 
y = c("Label A", "Label B") 

data3 = matrix(nrow = 2, ncol = 4) 
colnames(data3) = x 
row.names(data3) = y 

for (i in 1:4) { 
    freqSim <- data.frame(sample(0:1, length(1:100), replace = T, prob = c(1-data2[i],data2[i]))) 

    lossCol <- freqSim*(runif(n=100, min=0, max=7000)) 

    costAvg <- mean(as.numeric(unlist(lossCol))) 
    costSD <- sd(as.numeric(unlist(lossCol))) 

    costAvg <- formatC(costAvg, format='d', big.mark=",") 
    costSD <- formatC(costSD, format='d', big.mark= ",") 

    data3[1, i] = costAvg 
    data3[2, i] = costSD 
} 
1

下面是dplyr的示例。它不會給你你想要的矩陣,但它避免for循環一個更合適的方法:

freqSim <- lapply(names(data2), function(x) 
        sample(0:1, length(1:100), replace=T, 
        prob=c(1-data2[x], data2[x]))) 
names(freqSim) <- names(data2) 

lossCol <- lapply(freqSim, function(x) x*(runif(n=100, min=0, max=7000))) 

do.call(data.frame, lossCol) %>% 
    gather(type, val) %>% 
    group_by(type) %>% 
    summarise(mean=mean(val), sd=sd(val)) %>% 
    mutate_at(.cols=vars(mean, sd), .funs = funs(format(., format="d", big.mark=","))) 

# A tibble: 4 × 3 
     type  mean  sd 
     <chr>  <chr>  <chr> 
1 Blueberry 2,911.8587 2,481.310 
2 Chocolate 810.6141 1,820.357 
3 Strawberry 680.2027 1,659.491 
4 Vanilla 2,302.0011 2,305.148 
1

如果你真的想要一個矩陣格式輸出,可以使用outer在基礎R做到這一點。例如,爲了計算上的mtcars每列一個平均數和中位數,你可以這樣做:

> outer(list(mean=mean, median=median), as.data.frame(mtcars), Vectorize(function(f,y) f(y))) 
      mpg cyl  disp  hp  drat  wt  qsec  vs  am gear carb 
mean 20.090625 6.1875 230.721875 146.6875 3.5965625 3.21725 17.84875 0.4375 0.40625 3.6875 2.8125 
median 19.200000 6.0000 196.300000 123.0000 3.6950000 3.32500 17.71000 0.0000 0.00000 4.0000 2.0000 

外的第一個參數是要應用的功能命名列表,第二個是遍歷列,最後一個參數是一個函數來評估列上的函數。這裏需要Vectorize

在你的情況,我想你的代碼分爲三個部分:

生成樣本:

>  freqSim <- lapply(data2, function(x) sample(0:1, length(1:100), replace=T, prob=c(1-x,x)) *(runif(n=100, min=0, max=7000))) 

看起來是這樣的:

> str(freqSim) 
List of 4 
$ Chocolate : num [1:100] 0 0 0 0 0 ... 
$ Strawberry: num [1:100] 0 0 0 0 0 0 0 0 0 0 ... 
$ Vanilla : num [1:100] 4175 1456 0 1201 852 ... 
$ Blueberry : num [1:100] 0 3896 3794 5096 2901 ... 

聲明你的功能:

> funs <- list(`Label A`=function(x) formatC(mean(x), format='d', big.mark=","), 
       `Label B`=function(x) formatC(sd(x), format='d', big.mark=",")) 

使用outer

> outer(funs, freqSim, Vectorize(function(f,y) f(y))) 
     Chocolate Strawberry Vanilla Blueberry 
Label A "518"  "427"  "2,044" "2,441" 
Label B "1,417" "1,290" "2,250" "2,259"