2016-12-27 17 views
0

嗨,我想解析XLSX文件,我得到了下一個 -解析XLSX文件(BP世界能源統計評論)

import pandas as pd 
from pandas import DataFrame, read_csv 

path = 'bp-statistical-review-of-world-energy-2015-workbook.xlsx' 
xls = pd.ExcelFile(path) 
df = pd.read_excel(xls, 'Oil Production – Tonnes', index_col=0, na_values=['NA']) 

df.index.name = None 
#df.drop([0], axis=0, inplace=True) 
#df.drop((['Change']), axis=1, inplace=True) 
df.drop(df.columns[[50, 51]], axis=1, inplace=True) 
df.drop(df.index[[0, 77, 78, 79, 80, 81, 82, 83]], axis=0, inplace=True) 

My result

收到以下缺點,日期,第二行,但不是全部正確顯示,有些日期有一個視圖 - 2015.00000。此外,我不能移動上述日期的行。請幫助我)

Data

回答

0

難道這是你想要的嗎?

>>> import pandas as pd 
>>> path = 'bp-statistical-review-of-world-energy-2015-workbook.xlsx' 
>>> df = pd.read_excel(xls,'Oil Production – Tonnes',skiprows=2) 
>>> df.head() 
     Million tonnes  1965  1966  1967  1968  1969  1970 \ 
0     NaN  NaN  NaN  NaN  NaN  NaN  NaN 
1     US 427.694 454.539 484.222 502.88 511.352 533.49 
2    Canada 43.8742 48.2122 52.7011 57.1193 62.218 70.0679 
3    Mexico 18.0539 18.4895 20.4638 21.9007 22.965 24.179 
4 Total North America 489.623 521.241 557.387 581.9 596.535 627.737 

     1971  1972  1973 ...   2007  2008  2009 \ 
0  NaN  NaN  NaN ...   NaN   NaN   NaN 
1 525.888 527.888 514.652 ...  305.153524 302.254906 322.267908 
2 75.1638 86.7131 100.315 ...  155.286457 152.875890 152.805930 
3 24.1073 25.0976 25.8594 ...  172.231281 156.896182 146.664163 
4 625.159 639.699 640.826 ...  632.671262 612.026978 621.738001 

     2010  2011  2012  2013  2014  2013.1 \ 
0   NaN   NaN   NaN   NaN   NaN  NaN 
1 333.128080 345.352788 394.732788 448.494835 519.944404 0.15931 
2 160.293484 169.801471 182.586206 194.379612 209.800775 0.0793353 
3 145.600519 144.518511 143.857291 141.845640 137.097698 -0.0334726 
4 639.022083 659.672770 721.176285 784.720088 866.842877 0.104652 

    of total 
0  NaN 
1 0.123193 
2 0.049709 
3 0.032483 
4 0.205386 

[5 rows x 53 columns]