將原始數據框恢復到R

我有一個數據框，每年向不同國家的不同國家出口公司。我的問題是我需要創建一個變量，說每年在每個國家有多少公司。我可以用「tapply」命令完美地完成這項工作，如將原始數據框恢復到R

incumbents <- tapply(id, destination-year, function(x) length(unique(x)))

它工作得很好。我的問題是，在位者的長度爲length(destination-year)，我需要它的長度爲length(id)-每個目的地每年都有很多公司 - 在隨後的迴歸中使用它（當然，以一種匹配年份和目的地的方式）。一個「for」循環可以做到這一點，但是這是非常耗時的，因爲數據庫非常龐大。

有什麼建議嗎？

來源

2012-02-13 Francisco Roldán

對不起，沒有示例數據...新手錯誤 – 2012-02-13 23:45:54

你不提供重複的例子，所以我不能對此進行測試，但你應該能夠使用ave：

incumbents <- ave(id, destination-year, FUN=function(x) length(unique(x)))

來源

2012-02-13 21:11:05

工程很好。謝謝！！ – 2012-02-13 23:37:35

只需將tapply摘要與原始數據幀「合併」即merge即可。

由於您沒有提供示例數據，我做了一些。相應地修改。

n   = 1000 
id   = sample(1:10, n, replace=T) 
year  = sample(2000:2011, n, replace=T) 
destination = sample(LETTERS[1:6], n, replace=T) 

`destination-year` = paste(destination, year, sep='-') 

dat = data.frame(id, year, destination, `destination-year`)

現在列出您的摘要。請注意我如何重新格式化爲數據框，並使名稱與原始數據匹配。

incumbents = tapply(id, `destination-year`, function(x) length(unique(x))) 
incumbents = data.frame(`destination-year`=names(incumbents), incumbents)

最後，合併早在與原始數據：

merge(dat, incumbents)

順便說一句，而不是結合destination和year到第三個變量，像它看起來你已經做了， tapply可以直接處理兩個變量作爲列表：

incumbents = melt(tapply(id, list(destination=destination, year=year), function(x) length(unique(x))))

來源

2012-02-13 20:30:12

使用@ JohnColby的出色數據。例如，我在想東西沿着這一線路更多：

#I prefer not to deal with the pesky '-' in a variable name 
destinationYear = paste(destination, year, sep='-') 

dat = data.frame(id, year, destination, destinationYear) 

#require(plyr) 
dat <- ddply(dat,.(destinationYear),transform,newCol = length(unique(id))) 

#Or if more speed is required, use data.table 
require(data.table) 
datTable <- data.table(dat) 

datTable <- datTable[,transform(.SD,newCol = length(unique(id))),by = destinationYear]

來源

2012-02-13 21:06:00 joran

將原始數據框恢復到R

回答

相關問題