在多個數據幀中使用「應用」功能

我在使用apply函數（我認爲這是正確的方式來執行以下操作）跨多個數據幀。在多個數據幀中使用「應用」功能

一些示例數據（3個不同的數據幀，但我工作的問題有50以上）：

biz <- data.frame(
    country = c("england","canada","australia","usa"), 
    businesses = sample(1000:2500,4)) 

pop <- data.frame(
    country = c("england","canada","australia","usa"), 
    population = sample(10000:20000,4)) 

restaurants <- data.frame(
    country = c("england","canada","australia","usa"), 
    restaurants = sample(500:1000,4))

這裏就是我最終想要做的：

1）排序吃數據幀從最大到最小，根據該隨機配備可變

dataframe <- dataframe[order(dataframe$VARIABLE,)]

2）然後創建矢量可變，讓我每個

秩

dataframe$rank <- 1:nrow(dataframe)

3）然後創建另一個數據框，其中包含一列國家和每個感興趣變量的排名作爲其他列。東西會看起來像（排名都不是真正的在這裏）：

country.rankings <- structure(list(country = structure(c(5L, 1L, 6L, 2L, 3L, 4L), .Label = c("brazil", 
"canada", "england", "france", "ghana", "usa"), class = "factor"), 
    restaurants = 1:6, businesses = c(4L, 5L, 6L, 3L, 2L, 1L), 
    population = c(4L, 6L, 3L, 2L, 5L, 1L)), .Names = c("country", 
"restaurants", "businesses", "population"), class = "data.frame", row.names = c(NA, 
-6L))

所以我猜有把每個數據幀連成一個列表的方式，是這樣的：

lib <- c(biz, pop, restaurants)

然後做一個拉普利跨越1）排序，2）創建排名變量和3）爲每個國家/地區創建每個變量（企業數量，人口規模，餐館數量）的排名矩陣或數據框。問題我遇到的是寫lapply功能，當我嘗試通過可變訂購框架運行到問題的每個數據進行排序：

sort <- lapply(lib, 
    function(x){ 
     x <- x[order(x[,2]),] 
     })

返回錯誤信息：

Error in `[.default`(x, , 2) : incorrect number of dimensions

因爲我試圖將列標題應用於列表。但是，怎麼回事我會解決這個問題時，變量名是每個數據幀的不同（但要注意的是，國名是一致的保持）

（也很想知道如何使用這個使用plyr）

來源

2013-12-10 Marc Tulla

我相信它應該是'lib < - list（biz，pop，restaurants）'。並且，也許類似'cbind（as.character（biz [，1]），do.call（cbind，lapply（lib，function（x）order（x [，2]）））''？ –

totaldatasets <- c('biz','pop','restaurants') 
totaldatasetslist <- vector(mode = "list",length = length(totaldatasets)) 
for (i in seq(length(totaldatasets))) 
{ 
    totaldatasetslist[[i]] <- get(totaldatasets[i]) 
} 

totaldatasetslist2 <- lapply(
    totaldatasetslist, 
    function(x) 
    { 
    temp <- data.frame(
     country = totaldatasetslist[[i]][,1], 
     countryrank = rank(totaldatasetslist[[i]][,2]) 
    ) 

    colnames(temp) <- c('country', colnames(x)[2]) 

    return(temp) 
    } 
    ) 


Reduce(
    merge, 
    totaldatasetslist2 
)

輸出 -

country businesses population restaurants 
1 australia   3   3   3 
2 canada   2   2   2 
3 england   1   1   1 
4  usa   4   4   4

來源

2013-12-10 03:30:11 TheComeOnMan

謝謝科多！這絕對有效，但我正在尋找一些東西，正如你所提到的那樣，效率更高一些。大多數情況下，我遇到了使用'for'循環有意義的地方，R通過「apply」或「plyr」函數處理更有效的方法... –

更新........ – TheComeOnMan

理想我會建議data.table這一點。然而，這裏是用data.frame 試試這個快速解決方案：

第一步：創建所有data.frames

varList <- list(biz,pop,restaurants)

列表第二步：結合所有的人都在一個數據。框架

temp <- varList[[1]] 
for(i in 2:length(varList)) temp <- merge(temp,varList[[i]],by = "country")

第三步：獲取行列：

cbind(temp,apply(temp[,-1],2,rank))

，如果你願意，你可以刪除不需要的列！

cbind(temp[,1:2],apply(temp[,-1],2,rank))[,-2]

希望這有助於！

來源

2014-08-13 21:40:31 Shambho

在多個數據幀中使用「應用」功能

回答

相關問題