複雜的子集數據集設置爲數據框

1）我想在Gnu R中進行子集操作，數據集here只有巴西，時間和關於收入份額的所有系列名稱（如「收入份額最低10％」，「所持收入份額最低20％」等），共有7個收入份額系列名稱複雜的子集數據集設置爲數據框

我試過以下命令但不能子集一個以上的「Series.Name」：

test <- melt(subset(WDI, subset = Series.Name == "Income share held by lowest 10%", select = -c(Time.Code, Series.Code, Argentina, Canada, Chile, Colombia, Mexico, USA, Venezuela)), id.vars = c("Series.Name", "Time"))

2）在另一個第二步中，我想刪除具有NA值的所有行。

完整的代碼我用的是以下幾點：

WDI <- read.csv(https://dl.dropboxusercontent.com/u/109495328/WDI_Data_final.csv, na.strings = "..") 
library(reshape) 
library(reshape2) 
WDI <- rename(WDI, (c(Argentina..ARG.="Argentina", Brazil..BRA.="Brazil", Canada..CAN.="Canada", Chile..CHL.="Chile", Colombia..COL.="Colombia", Mexico..MEX.="Mexico", United.States..USA.="USA", Venezuela..RB..VEN.="Venezuela"))) 
income_brazil_long <- melt(subset(WDI, subset = Series.Name == "Income share held by lowest 10%", select = -c(Time.Code, Series.Code, Argentina, Canada, Chile, Colombia, Mexico, USA, Venezuela)), id.vars = c("Series.Name", "Time"))

來源

2015-01-14 Til Hund

中的數據的問題的措施128 KB的大小設置。我認爲提供原始數據可能會更好，而不是爲了防止誤解而編寫一些隨機數據。 –

http://stackoverflow.com/questions/4862178/remove-rows-with-nas-in-data-frame萬一也。 –

謝謝，這是解決方案。 –

看你的數據，這其實是最簡單的大概使用grepl幫助的子集。

我們使用grepl通過「Series.Name」列包含字符串「舉辦收入佔比」的所有行進行搜索。這會創建一個邏輯向量來指示我們想要的行。我們想要的列是第一，第三和第六。

總結這一切在na.omit得到與NA值去掉任何行。

WDI_Brazil <- na.omit(WDI[grepl("Income share held", WDI$Series.Name), 
          c(1, 3, 6)])

該數據已經「長」，所以沒有必要melt。 data.frame是什麼樣的？

summary(WDI_Brazil) 
#       Series.Name  Time  Brazil..BRA. 
# Income share held by fourth 20% :28 Min. :1981 Min. : 0.600 
# Income share held by highest 10%:28 1st Qu.:1988 1st Qu.: 2.895 
# Income share held by highest 20%:28 Median :1996 Median :10.320 
# Income share held by lowest 10% :28 Mean :1996 Mean :20.948 
# Income share held by lowest 20% :28 3rd Qu.:2004 3rd Qu.:43.797 
# Income share held by second 20% :28 Max. :2012 Max. :67.310 
# (Other)       :28         
table(droplevels(WDI_Brazil$Series.Name)) 
# 
# Income share held by fourth 20% Income share held by highest 10% Income share held by highest 20% 
#        28        28        28 
# Income share held by lowest 10% Income share held by lowest 20% Income share held by second 20% 
#        28        28        28 
# Income share held by third 20% 
#        28

請注意，根據預期，「Series.Name」中有七個因子級別。

來源

2015-01-14 17:17:46 A5C1D2H2I1M1N2O1R2T1

太棒了！喜歡它的清晰度。 –

好吧，你可以做你與base功能尋找什麼。

WDI <- read.csv("WDI_Data_final.csv", header=T, na.strings="..") 

# The colnames are strange from the file so reset for clarity 
colnames(WDI) <- c("Series.Name", "Series.Code", "Time","Time.Code","Argentina", 
        "Brazil", "Canada", "Chile", "Colombia","Mexico", 
        "USA", "Venezuela") 

# do the subsetting 
test <- with(WDI, 
      WDI[Series.Name=="Income share held by lowest 10%", 
       c("Brazil","Time", "Series.Name")]) 

# if you want more, use %in% and specify the Series.Names you care about 
test <- with(WDI, 
      WDI[Series.Name %in% c("Income share held by lowest 10%", 
            "Income share held by lowest 20%"), 
       c("Brazil","Time", "Series.Name")]) 

# if you want all the 'income shares', the grepl solution above by 
# Ananda is the most concise. 

# you can then use reshape2::melt 
melted_test <- melt(test, id.vars=c("Series.Name", "Time"))

要刪除NA只使用complete.cases

test[complete.cases(test),]

來源

2015-01-14 17:13:47 cdeterman

@ColonelBeauvel良好的用眼，固定 – cdeterman

非常感謝您的回答，cdeterman。我的答案是，我的代碼太複雜了。不知道有'colnames'存在。豎起大拇指，男人！ –

複雜的子集數據集設置爲數據框

回答

相關問題