我建議不要依靠大熊貓直接在這裏,而是由線打開該文件,並處理它行構建字典的列表,並用它來創建一個數據框做解析:
with open('yourfile.txt','r') as f:
content = f.read().splitlines()
state = None
l_dict = []
for line in content:
if '[edit]' in line:
state = line.split('[')[0]
else:
l_dict.append({'St. Name':state, 'Region':line})
df = pd.DataFrame(l_dict)
df.set_index('St. Name', inplace=True)
如果你真的想在大熊貓的事,我想你可以通過處理各國和各地區分開,並用NaN的一種forward fill(DataFrame.ffill
做這種方式是一樣的fillna(method='ffill')
(或pad
)
df = pd.DataFrame('yourfile.txt', columns=['txt'])
# Create a column that'll serve as a filter IsState
df['IsState'] = df['txt'].str.contains('\[edit\]')
# Split and get first item of split
df.loc[df.IsState, 'St. Name'] = df.loc[df.IsState, 'txt'].str.split('[').str.get(0)
# the `~`means not
df.loc[~df.IsState, 'Region'] = df.loc[~df.IsState, 'txt']
# Forward fill the NaNs
df['St. Name'] = df['St. Name'].ffill()
# Select what you truly want and set index
df = df.loc[~df.IsState, ['St. Name', 'Region']]
df.set_index('St. Name', inplace=True)
這看起來的確很像([簡介數據科學Python中的第4周] HTTPS ://www.coursera.org/learn/python-data-analysis)當然:) –
是的。他們鼓勵你在stackoverflow上提問。所以我做了:) – minibuffer