2017-10-14 33 views
0

了一系列新的變量所以我有一個數據集在我的輸出是:創建使用帶有重複

gdp = pd.read_csv(r"gdpproject.csv", 
encoding="ISO-8859-1") 
gdp.head(2) 
gdp.tail(2) 

這給我的輸出:

Country.Name Indicator.Name 2004 2005  
0 World GDP 5.590000e+13 5.810000e+13 
1 World Health 5.590000e+13 5.810000e+13 
086 Zimbabwe GDP per capita 8.681564e+02 8.082944e+02 
089 Zimbabwe Population 1.277751e+07 1.294003e+07 

所以你馬上注意到每個國家有多個指標。

我想要做的是從兩個當前指標中創建一個新指標。併爲每個獨特的國家創建它。

for i in series(gdp['Country.Name']): 
gdp['Military Spending'] = 100/gdp['Military percent of GDP'] * 
gdp['GDP'] 

它給我這個錯誤消息:

NameError         Traceback (most recent call last) 
<ipython-input-37-d817ea1522fc> in <module>() 
----> 1 for i in series(gdp1['Country.Name']): 
    2  gdp1['Military Spending'] = 100/gdp1['Military percent of GDP'] * 
gdp1['GDP'] 

NameError: name 'series' is not defined 

我如何獲得這一系列的工作?我也曾嘗試過,只是簡單的

for i in gdp['Country.Name'] 

但仍然收到錯誤消息。

請幫忙!

回答

0

比方說,你有以下輸入Dataframe(請注意,在您的示例數據Military percent of GDP是不存在):

>>> gdp 
    Country.Name   Indicator.Name   2004   2005 
0  World      GDP 5.590000e+13 5.810000e+13 
1  World Military percent of GDP 2.100000e+00 2.300000e+00 
2  Zimbabwe      GDP 1.628900e+10 1.700000e+10 
3  Zimbabwe Military percent of GDP 2.000000e+00 2.100000e+00 

然後,您可以創建輔助dataframes df_gdpdf_mpgdp的數據與來自20042005GDPMilitary percent of GDP分別。然後,您可以創建df_msp,其中將包含名爲Military Spending的新Indicator.Name,最後將其結果附加到原始Dataframe。請注意,在某些情況下我們需要reset_index以確保計算是在預期指標下完成的。

下面的代碼應該爲你工作的目標是什麼:

import pandas as pd 
gdp = pd.DataFrame([ 
["World", "GDP", 5.590000e+13, 5.810000e+13], 
["World", "Military percent of GDP", 2.1, 2.3], 
["Zimbabwe", "GDP", 16289e6, 17000e6], 
["Zimbabwe", "Military percent of GDP", 2, 2.1]]) 
gdp.columns = ["Country.Name", "Indicator.Name", "2004", "2005"] 

df_gdp = gdp[gdp["Indicator.Name"] == "GDP"] 
df_mpgdp = gdp[gdp["Indicator.Name"] == "Military percent of GDP"] 

df_msp = pd.DataFrame() 
df_msp["Country.Name"] = df_gdp["Country.Name"].reset_index(drop=True) 
df_msp["Indicator.Name"] = "Military Spending" 
df_msp["2004"] = 100/df_mpgdp[["2004"]].reset_index(drop=True) * df_gdp[["2004"]].reset_index(drop=True) 
df_msp["2005"] = 100/df_mpgdp[["2005"]].reset_index(drop=True) * df_gdp[["2005"]].reset_index(drop=True) 

gdp_out = gdp.append(df_msp) 
gdp_out = gdp_out.sort_values(["Country.Name", "Indicator.Name"]) 
gdp_out = gdp_out.reset_index(drop=True) 

最後輸出Dataframe會導致:

>>> gdp_out 
    Country.Name   Indicator.Name   2004   2005 
0  World      GDP 5.590000e+13 5.810000e+13 
1  World  Military Spending 2.661905e+15 2.526087e+15 
2  World Military percent of GDP 2.100000e+00 2.300000e+00 
3  Zimbabwe      GDP 1.628900e+10 1.700000e+10 
4  Zimbabwe  Military Spending 8.144500e+11 8.095238e+11 
5  Zimbabwe Military percent of GDP 2.000000e+00 2.100000e+00