2016-11-05 93 views
0

年我TOT他一個類似的一個數據幀:熊貓KeyError異常:爲csv文件數據框

BirthYear Sex Area Count 
2015   W  Dhaka 6 
2015   M  Dhaka 3 
2015   W  Khulna 1 
2015   M  Khulna 8 
2014   M  Dhaka 13 
2014   W  Dhaka 20 
2014   M  Khulna 9 
2014   W  Khulna 6 
2013   W  Dhaka 11 
2013   M  Dhaka 2 
2013   W  Khulna 8 
2013   M  Khulna 5 
2012   M  Dhaka 12 
2012   W  Dhaka 4 
2012   W  Khulna 7 
2012   M  Khulna 1 

現在我想創建一個大熊貓條形圖,其中只有男&女出生於2015年將被顯示。 代碼:

df = pd.read_csv('out.csv') 
df=df.reset_index() 
df=df.loc[df["BirthYear"]==2015] 
agg_df = df.groupby(['Sex']).sum() 
agg_df.reset_index(inplace=True) 
piv_df = agg_df.pivot(columns='Sex', values='Count') 
piv_df.plot.bar(stacked=True) 
plt.show() 

和執行後,IDLE顯示了這個錯誤:

Traceback (most recent call last): 
    File "C:\Users\sabid\AppData\Local\Programs\Python\Python35\lib\site-packages\pandas\indexes\base.py", line 1945, in get_loc 
    return self._engine.get_loc(key) 
    File "pandas\index.pyx", line 137, in pandas.index.IndexEngine.get_loc (pandas\index.c:4066) 
    File "pandas\index.pyx", line 159, in pandas.index.IndexEngine.get_loc (pandas\index.c:3930) 
    File "pandas\hashtable.pyx", line 675, in pandas.hashtable.PyObjectHashTable.get_item (pandas\hashtable.c:12408) 
    File "pandas\hashtable.pyx", line 683, in pandas.hashtable.PyObjectHashTable.get_item (pandas\hashtable.c:12359) 
KeyError: 'BirthYear' 

During handling of the above exception, another exception occurred: 

Traceback (most recent call last): 
    File "C:/Users/sabid/Dropbox/Freelancing/data visualization python/pie.py", line 8, in <module> 
    df=df.loc[df["StichtagDatJahr"]==2015] 
    File "C:\Users\sabid\AppData\Local\Programs\Python\Python35\lib\site-packages\pandas\core\frame.py", line 1997, in __getitem__ 
    return self._getitem_column(key) 
    File "C:\Users\sabid\AppData\Local\Programs\Python\Python35\lib\site-packages\pandas\core\frame.py", line 2004, in _getitem_column 
    return self._get_item_cache(key) 
    File "C:\Users\sabid\AppData\Local\Programs\Python\Python35\lib\site-packages\pandas\core\generic.py", line 1350, in _get_item_cache 
    values = self._data.get(item) 
    File "C:\Users\sabid\AppData\Local\Programs\Python\Python35\lib\site-packages\pandas\core\internals.py", line 3290, in get 
    loc = self.items.get_loc(item) 
    File "C:\Users\sabid\AppData\Local\Programs\Python\Python35\lib\site-packages\pandas\indexes\base.py", line 1947, in get_loc 
    return self._engine.get_loc(self._maybe_cast_indexer(key)) 
    File "pandas\index.pyx", line 137, in pandas.index.IndexEngine.get_loc (pandas\index.c:4066) 
    File "pandas\index.pyx", line 159, in pandas.index.IndexEngine.get_loc (pandas\index.c:3930) 
    File "pandas\hashtable.pyx", line 675, in pandas.hashtable.PyObjectHashTable.get_item (pandas\hashtable.c:12408) 
    File "pandas\hashtable.pyx", line 683, in pandas.hashtable.PyObjectHashTable.get_item (pandas\hashtable.c:12359) 
KeyError: 'BirthYear' 

我來自this link知道,這是因爲在「BirthYear」列名,收到了一些頭。 但我不知道如何刪除標題,並使代碼工作。 這是否有任何富有成效的解決方案?

+0

[解決方案](http://stackoverflow.com/a/23733522/5741205)已經在提供的鏈接,你已經張貼 - 沒有你嘗試一下? – MaxU

+0

你的意思是「之前的一些標題?」如果你的意思是在字符串的開頭有一個空格? – Batman

+0

@MaxU我試過了,但那並沒有真正起作用......錯誤一次又一次地出現 –

回答

1

您可以重命名列。

df.rename(columns=["BirthYear", "Sex", "Area", "Count"], inplace=True) 
+2

我也想用'df.reset_index(drop = True,inplace = True)'替換'df = df.reset_index()''。將drop設置爲false將使前一個索引顯示爲一個名爲index的新列。 – Jakub

+0

是的。好點子。 – Batman

+0

@Batman,'df.rename(columns = list)'將產生'TypeError:'list'對象不可調用。測試這個:'df = pd.DataFrame(np.random.rand(3,2),columns = list('ab')); df.rename(columns = ['X','Y'])' – MaxU

0

我假設你想輸出是這樣的:

Barplot

我不知道這一點,但我認爲使用pivot方法搞砸你。您不需要使用pivot,因爲agg_df基本上是一個數據透視表。下面是我用來創建圖形代碼:

import pandas as pd 

# I made this to approximate your CSV file. 
table = { 
    'BirthYear': [2015, 2015, 2015, 2015, 2014, 2014,], 
    'Sex': ['W', 'M', 'W', 'M', 'M', 'W',], 
    'Area': ['Dhaka', 'Dhaka', 'Khulna', 'Khulna', 'Dhaka', 'Dhaka',], 
    'Count': [6, 3, 1, 8, 13, 20] 
} 

df = pd.DataFrame(table) 
df = df.reset_index(drop=True) 

# Select people born in 2015. 
df = df.loc[df["BirthYear"] == 2015] 

# This is basically a pivot table. 
agg_df = df.groupby(['Sex']).sum() 

# Make the plot. 
agg_df['Count'].plot.bar(stacked=True) 
+0

這裏的訣竅是指定正確的BOM編碼。 –

+0

感謝您的努力。 –