您可以使用:
url = 'http://www.mapsofworld.com/usa/usa-maps/united-states-regional-maps.html'
#input dataframe with columns a, b
df = pd.read_html(url)[8]
df.columns = ['a','b']
#extract Region data to new column
df['Region'] = df['a'].where(df['a'].str.contains('Region', na=False)).ffill()
#reshaping, remove rows with NaNs, remove column variable
df = pd.melt(df, id_vars='Region', value_name='Names')
.sort_values(['Region', 'variable'])
.dropna()
.drop('variable', axis=1)
#extract Division data to new column
df['Division'] = df['Names'].where(df['Names'].str.contains('Division', na=False)).ffill()
#remove duplicates from column Names, change order of columns
df = df[(df.Division != df.Names) & (df.Region != df.Names)]
.reset_index(drop=False)
.reindex_axis(['Region','Division','Names'], axis=1)
#temporaly display all columns
with pd.option_context('display.expand_frame_repr', False):
print (df)
Region Division Names
0 Region 1 (The Northeast) Division 1 (New England) Maine
1 Region 1 (The Northeast) Division 1 (New England) New Hampshire
2 Region 1 (The Northeast) Division 1 (New England) Vermont
3 Region 1 (The Northeast) Division 1 (New England) Massachusetts
4 Region 1 (The Northeast) Division 1 (New England) Rhode Island
5 Region 1 (The Northeast) Division 1 (New England) Connecticut
6 Region 1 (The Northeast) Division 2 (Middle Atlantic) New York
7 Region 1 (The Northeast) Division 2 (Middle Atlantic) Pennsylvania
8 Region 1 (The Northeast) Division 2 (Middle Atlantic) New Jersey
9 Region 2 (The Midwest) Division 3 (East North Central) Wisconsin
10 Region 2 (The Midwest) Division 3 (East North Central) Michigan
11 Region 2 (The Midwest) Division 3 (East North Central) Illinois
12 Region 2 (The Midwest) Division 3 (East North Central) Indiana
13 Region 2 (The Midwest) Division 3 (East North Central) Ohio
...
...
真棒!你讓這看起來很容易。我會逐步瞭解這一點,以更好地瞭解每一步。 – Nick
很高興能爲您提供幫助。另外如果需要括號中的值可能有助於[這個答案](http://stackoverflow.com/q/41386443/2901002)。美好的一天。 – jezrael