2017-03-22 54 views
1

我有這樣的框架:當索引和值位於同一列時,如何做多索引樞軸?

regions = pd.read_html('http://www.mapsofworld.com/usa/usa-maps/united-states-regional-maps.html') 
messy_regions = regions[8] 

其中產量是這樣的:

|0 |  1 
--- |---| --- 
0| Region 1 (The Northeast)| nan 
1| Division 1 (New England)| Division 2 (Middle Atlantic) 
2| Maine      | New York 
3| New Hampshire    | Pennsylvania 
4| Vermont     | New Jersey 
5| Massachusetts    |nan 
6| Rhode Island    |nan 
7| Connecticut    | nan 
8| Region 2 (The Midwest) | nan 
9| Division 3 (East North Central)| Division 4 (West North Central) 
10| Wisconsin  |    North Dakota 
11| Michigan  |    South Dakota 
12| Illinois |    Nebraska 

的目標是使之成爲一個整潔的數據幀,我想我需要爲了得到各區域繞軸轉動,分區作爲正確地區/分區下各個州的列。一旦它處於這種形狀,我就可以融化成所需的形狀。我無法弄清楚如何提取這個列標題。任何幫助都是值得讚賞的,至少在正確的方向上是一個好點。

回答

1

您可以使用:

url = 'http://www.mapsofworld.com/usa/usa-maps/united-states-regional-maps.html' 
#input dataframe with columns a, b 
df = pd.read_html(url)[8] 
df.columns = ['a','b'] 

#extract Region data to new column 
df['Region'] = df['a'].where(df['a'].str.contains('Region', na=False)).ffill() 
#reshaping, remove rows with NaNs, remove column variable 
df = pd.melt(df, id_vars='Region', value_name='Names') 
     .sort_values(['Region', 'variable']) 
     .dropna() 
     .drop('variable', axis=1) 
#extract Division data to new column 
df['Division'] = df['Names'].where(df['Names'].str.contains('Division', na=False)).ffill() 
#remove duplicates from column Names, change order of columns 
df = df[(df.Division != df.Names) & (df.Region != df.Names)] 
     .reset_index(drop=False) 
     .reindex_axis(['Region','Division','Names'], axis=1) 
#temporaly display all columns 
with pd.option_context('display.expand_frame_repr', False): 
    print (df) 

         Region       Division     Names 
0 Region 1 (The Northeast)   Division 1 (New England)     Maine 
1 Region 1 (The Northeast)   Division 1 (New England)   New Hampshire 
2 Region 1 (The Northeast)   Division 1 (New England)    Vermont 
3 Region 1 (The Northeast)   Division 1 (New England)   Massachusetts 
4 Region 1 (The Northeast)   Division 1 (New England)   Rhode Island 
5 Region 1 (The Northeast)   Division 1 (New England)   Connecticut 
6 Region 1 (The Northeast)  Division 2 (Middle Atlantic)    New York 
7 Region 1 (The Northeast)  Division 2 (Middle Atlantic)   Pennsylvania 
8 Region 1 (The Northeast)  Division 2 (Middle Atlantic)   New Jersey 
9  Region 2 (The Midwest) Division 3 (East North Central)    Wisconsin 
10 Region 2 (The Midwest) Division 3 (East North Central)    Michigan 
11 Region 2 (The Midwest) Division 3 (East North Central)    Illinois 
12 Region 2 (The Midwest) Division 3 (East North Central)    Indiana 
13 Region 2 (The Midwest) Division 3 (East North Central)     Ohio 
... 
... 
+0

真棒!你讓這看起來很容易。我會逐步瞭解這一點,以更好地瞭解每一步。 – Nick

+0

很高興能爲您提供幫助。另外如果需要括號中的值可能有助於[這個答案](http://stackoverflow.com/q/41386443/2901002)。美好的一天。 – jezrael