2016-09-01 33 views
1

我有一個相當具體的問題。我有一個csv表,我想根據這兩個條件提取數據,並獲取這個的平均值()。我的代碼是:組合mean()和subset()的問題

GDP <- mean(subset(World,World$Year==2013)$GDP_in_USD,na.rm=TRUE) 

世界是我的csv表。在列表中,我收集了1960-2015年間全球所有國家不同專欄的數據。我想要從2013年起擁有列GDP_in_USD的所有值(因此基本上每個國家/地區都有一個單元格)。

當我使用這個函數時,我得到的錯誤是數值既不是數字也不是boolesh。奇怪的是,我的一個朋友給了我代碼,它在他的電腦上工作。當我嘗試重現它時,我收到錯誤。讀取CSV表,我用:

World <- read.csv("World2.csv", header=TRUE, sep=delim, dec=dec, stringsAsFactors=FALSE) 

什麼會導致問題如果您需要進一步的信息,讓我知道?

structure(list(Country.Year.Zeitraum_NR.Agriculture_value_added_percent_of_GDP.Central_government_debt_total_percent_of_GDP.Cost_to_export_USD_per_container.Cost_to_import_USD_per_container.Employment_in_agriculture_percent_of_total_employment.Employment_in_industry_percent_of_total_employment.Employment_in_services_percent_of_total_employment.Exports_of_goods_and_services_percent_of_GDP.Final_consumption_expenditure_etc_percent_of_GDP.Foreign_direct_investment_net_inflows_percent_of_GDP.Foreign_direct_investment_net_outflows_percent_of_GDP.General_government_final_consumption_expenditure_._of_GDP.GDP_growth_annual_percent.Government_expenditure_on_education_total_percent_of_GDP.Household_final_consumption_expenditure_etc_percent_of_GDP.Imports_of_goods_and_services_percent_of_GDP.Industry_value_added_percent_of_GDP.Inflation_consumer_prices_annual_percent.Lending_interest_rate_percent.Patent_applications_residents_._nonresidents.Research_and_development_expenditure_percent_of_GDP.Services_etc_value_added_percent_of_GDP.Subsidies_and_other_transfers_percent_of_expense.Tariff_rate_applied_simple_mean_all_products_percent.Taxes_on_exports_percent_of_tax_revenue.Taxes_on_goods_and_services_percent_of_revenue.Taxes_on_income_profits_and_capital_gains_percent_of_revenue.Taxes_on_international_trade_percent_of_revenue.Total_tax_rate_percent_of_commercial_profits.Trade_percent_of_GDP.Unemployment_total_percent_of_total_labor_force_national_estimate.GDP_in_USD = c("Afghanistan;1960;1;..;..;..;..;..;..;..;4.132233258;86.77685029;..;..;..;..;..;..;7.024793471;..;..;..;..;..;..;..;..;..;..;..;..;..;11.15702673;..;537777811.91", 
"Afghanistan;1961;1;..;..;..;..;..;..;..;4.453443322;87.0445247;..;..;..;..;..;..;8.097166426;..;..;..;..;..;..;..;..;..;..;..;..;..;12.55060975;..;548888894.58", 
"Afghanistan;1962;1;..;..;..;..;..;..;..;4.878051281;85.36583991;..;..;..;..;..;..;9.349593301;..;..;..;..;..;..;..;..;..;..;..;..;..;14.22764458;..;546666678.04", 
"Afghanistan;1963;1;..;..;..;..;..;..;..;9.171601205;93.49111965;..;..;..;..;..;..;16.86391035;..;..;..;..;..;..;..;..;..;..;..;..;..;26.03551156;..;751111190.76", 
"Afghanistan;1964;1;..;..;..;..;..;..;..;8.88889265;95.2777688;..;..;..;..;..;..;18.05555524;..;..;..;..;..;..;..;..;..;..;..;..;..;26.94444789;..;800000045.51", 
"Afghanistan;1965;1;..;..;..;..;..;..;..;11.25827903;98.89624551;..;..;..;..;..;..;21.41280357;..;..;..;..;..;..;..;..;..;..;..;..;..;32.6710826;..;1006666638.22" 
)), .Names = "Country.Year.Zeitraum_NR.Agriculture_value_added_percent_of_GDP.Central_government_debt_total_percent_of_GDP.Cost_to_export_USD_per_container.Cost_to_import_USD_per_container.Employment_in_agriculture_percent_of_total_employment.Employment_in_industry_percent_of_total_employment.Employment_in_services_percent_of_total_employment.Exports_of_goods_and_services_percent_of_GDP.Final_consumption_expenditure_etc_percent_of_GDP.Foreign_direct_investment_net_inflows_percent_of_GDP.Foreign_direct_investment_net_outflows_percent_of_GDP.General_government_final_consumption_expenditure_._of_GDP.GDP_growth_annual_percent.Government_expenditure_on_education_total_percent_of_GDP.Household_final_consumption_expenditure_etc_percent_of_GDP.Imports_of_goods_and_services_percent_of_GDP.Industry_value_added_percent_of_GDP.Inflation_consumer_prices_annual_percent.Lending_interest_rate_percent.Patent_applications_residents_._nonresidents.Research_and_development_expenditure_percent_of_GDP.Services_etc_value_added_percent_of_GDP.Subsidies_and_other_transfers_percent_of_expense.Tariff_rate_applied_simple_mean_all_products_percent.Taxes_on_exports_percent_of_tax_revenue.Taxes_on_goods_and_services_percent_of_revenue.Taxes_on_income_profits_and_capital_gains_percent_of_revenue.Taxes_on_international_trade_percent_of_revenue.Total_tax_rate_percent_of_commercial_profits.Trade_percent_of_GDP.Unemployment_total_percent_of_total_labor_force_national_estimate.GDP_in_USD", row.names = c(NA, 
6L), class = "data.frame") 

enter image description here

+0

請通過發佈'dput(head(World))'的輸出來提供您的數據樣本。請參閱[發佈準則](http://stackoverflow.com/tags/r/info)。 –

+0

很可能「...」被解釋爲一個字符。運行str(World)來驗證列是您期望的類。 – Dave2e

+0

是的,你可以嘗試的意思(as.numeric(World $ GDP_in_USD [World $ Year == 2013]),na.rm = T),看看是否修復它。 – gfgm

回答

0

你的數據是一個爛攤子,列名之間用 「」並且數據由「;」分隔。如果你的數據結構被稱爲「df」,這是一個可能的解決方案。

# your data from above 
# World<-structure(list(Country.Year. ...... 

#get names and split 
names<-strsplit(names(World), ".", fixed=TRUE)[[1]] 
#37 names are created but only 35 columns of data exist 
#removing the 2 most like errors 
names[15]<-paste0(names[15], names[16]) 
names[24]<-paste0(names[24], names[25]) 
names<-names[-c(16,25)] 

#now split the main body of the table 
temp<-sapply(World, function(x){strsplit(x, ";", fixed=TRUE)}) 
newdf<-as.data.frame(matrix(unlist(temp), ncol=35, byrow = TRUE)) 
#rename the columns 
names(newdf)<-names 
#convert the strings to numbers 
newdf[,2:35]<-apply(newdf[,2:35], 2, function(x){as.numeric(as.character(x))}) 

不是最優雅的代碼,但應該讓你朝着正確的方向前進。