1
此問題引用自this SO Question.如何跟蹤熊貓數據框中以前的日期記錄列?
我想對熊貓數據框執行一些數據分析。我有一個像這樣的數據框:
derived_symbol sport_name person_name city \
0 football.RAM.mumbai.ram_count football RAM mumbai
1 football.RAM.mumbai.mum_count football RAM mumbai
2 football.RAM.delhi.mum_count football RAM delhi
3 football.RAM.delhi.ram_count football RAM delhi
4 football.RAM.mumbai.ram_count football RAM mumbai
5 football.RAM.mumbai.mum_count football RAM mumbai
6 football.RAM.delhi.mum_count football RAM delhi
7 football.RAM.delhi.ram_count football RAM delhi
8 basketball.MAH.pune.mah_count basketball MAH pune
9 basketball.MAH.nagpur.mah_count basketball MAH nagpur
10 basketball.MAH.TOTAL.mah_count basketball MAH No Entry
11 basketball.MAH.TOTAL.nagpur_count basketball MAH nagpur
12 basketball.MAH.TOTAL.pune_count basketball MAH pune
13 football.RAM.TOTAL.delhi_count football RAM delhi
14 football.RAM.TOTAL.delhi_count football RAM delhi
15 football.RAM.TOTAL.mum_count football RAM No Entry
16 football.RAM.TOTAL.mum_count football RAM No Entry
17 football.RAM.TOTAL.mumbai_count football RAM mumbai
18 football.RAM.TOTAL.mumbai_count football RAM mumbai
19 football.RAM.TOTAL.ram_count football RAM No Entry
20 football.RAM.TOTAL.ram_count football RAM No Entry
person_symbol month sir person_count
0 ram 2017-01-23 a 10
1 mum 2017-01-23 a 14
2 mum 2017-01-23 a 25
3 ram 2017-01-23 a 20
4 ram 2017-02-22 b 34
5 mum 2017-02-22 b 23
6 mum 2017-02-22 b 43
7 ram 2017-02-22 b 34
8 mah 2017-03-03 c 10
9 mah 2017-03-03 c 20
10 mah 2017-03-03 c 30
11 No Entry 2017-03-03 c 20
12 No Entry 2017-03-03 c 10
13 No Entry 2017-01-23 a 45
14 No Entry 2017-02-22 b 77
15 mum 2017-01-23 a 39
16 mum 2017-02-22 b 66
17 No Entry 2017-01-23 a 24
18 No Entry 2017-02-22 b 57
19 ram 2017-01-23 a 30
20 ram 2017-02-22 b 68
我想將previous_person_count列添加到此Dataframe。此數據框的「月」列包含格式爲「yyyy-mm-dd」的日期。所以我們需要看一個月,即「mm」字段來確定它是哪個月。
通過查看本月,我們需要將「person_count」值放入下個月的「previous_person_count」值中。
Exceted輸出:
derived_symbol sport_name person_name city \
0 football.RAM.mumbai.ram_count football RAM mumbai
1 football.RAM.mumbai.mum_count football RAM mumbai
2 football.RAM.delhi.mum_count football RAM delhi
3 football.RAM.delhi.ram_count football RAM delhi
4 football.RAM.mumbai.ram_count football RAM mumbai
5 football.RAM.mumbai.mum_count football RAM mumbai
6 football.RAM.delhi.mum_count football RAM delhi
7 football.RAM.delhi.ram_count football RAM delhi
8 basketball.MAH.pune.mah_count basketball MAH pune
9 basketball.MAH.nagpur.mah_count basketball MAH nagpur
10 basketball.MAH.TOTAL.mah_count basketball MAH No Entry
11 basketball.MAH.TOTAL.nagpur_count basketball MAH nagpur
12 basketball.MAH.TOTAL.pune_count basketball MAH pune
13 football.RAM.TOTAL.delhi_count football RAM delhi
14 football.RAM.TOTAL.delhi_count football RAM delhi
15 football.RAM.TOTAL.mum_count football RAM No Entry
16 football.RAM.TOTAL.mum_count football RAM No Entry
17 football.RAM.TOTAL.mumbai_count football RAM mumbai
18 football.RAM.TOTAL.mumbai_count football RAM mumbai
19 football.RAM.TOTAL.ram_count football RAM No Entry
20 football.RAM.TOTAL.ram_count football RAM No Entry
person_symbol month sir person_count previous_person_count
0 ram 2017-01-23 a 10 0
1 mum 2017-01-23 a 14 0
2 mum 2017-01-23 a 25 0
3 ram 2017-01-23 a 20 0
4 ram 2017-02-22 b 34 10
5 mum 2017-02-22 b 23 14
6 mum 2017-02-22 b 43 25
7 ram 2017-02-22 b 34 20
8 mah 2017-03-03 c 10 0
9 mah 2017-03-03 c 20 0
10 mah 2017-03-03 c 30 0
11 No Entry 2017-03-03 c 20 0
12 No Entry 2017-03-03 c 10 0
13 No Entry 2017-01-23 a 45 0
14 No Entry 2017-02-22 b 77 45
15 mum 2017-01-23 a 39 0
16 mum 2017-02-22 b 66 39
17 No Entry 2017-01-23 a 24 0
18 No Entry 2017-02-22 b 57 24
19 ram 2017-01-23 a 30 0
20 ram 2017-02-22 b 68 30
編輯參考代碼:
df = pd.DataFrame({'sport_name': ['football','football','football','football','football','football','football','football','basketball','basketball'],
'person_name': ['ramesh','ramesh','ramesh','ramesh','ramesh','ramesh','ramesh','ramesh','mahesh','mahesh'],
'city': ['mumbai', 'mumbai','delhi','delhi','mumbai', 'mumbai','delhi','delhi','pune','nagpur'],
'person_symbol': ['ram','mum','mum','ram','ram','mum','mum','ram','mah','mah'],
'person_count': ['10','14','25','20','34','23','43','34','10','20'],
'month': ['2017-01-23','2017-01-23','2017-01-23','2017-01-23','2017-02-22','2017-02-22','2017-02-22','2017-02-22','2017-03-03','2017-03-03'],
'sir': ['a','a','a','a','b','b','b','b','c','c']})
df = df[['sport_name','person_name','city','person_symbol','person_count','month','sir']]
df['person_name'] = df['person_name'].apply(symbology)
df['person_count'] = df['person_count'].astype(int)
print df
df1=df.set_index(['sport_name','person_name','person_count','month','sir']).stack().reset_index(name='val')
df1['derived_symbol'] = df1['sport_name'] + '.' + df1['person_name'] + '.TOTAL.' + df1['val'] + '_count'
df2 = df1.groupby(['derived_symbol','month','sir','sport_name','person_name','level_5','val'])['person_count'].sum().reset_index(name='person_count')
df3 = df2.set_index(['derived_symbol','month','sir','sport_name','person_name','person_count','level_5'])['val'].unstack().fillna('No Entry').rename_axis(None, 1).reset_index()
df['derived_symbol'] = df['sport_name'] + '.' + df['person_name'] + '.' + df['city'] + "."+ df['person_symbol'] + '_count'
df4 = pd.concat([df, df3]).reset_index(None)
print df3
del df4['index']
df4 = df4[['derived_symbol','sport_name','person_name','city','person_symbol','month','sir','person_count']]
print df4
便利:
d = {'city': {0: 'mumbai',
1: 'mumbai',
2: 'delhi',
3: 'delhi',
4: 'mumbai',
5: 'mumbai',
6: 'delhi',
7: 'delhi',
8: 'pune',
9: 'nagpur',
10: 'No Entry',
11: 'nagpur',
12: 'pune',
13: 'delhi',
14: 'delhi',
15: 'No Entry',
16: 'No Entry',
17: 'mumbai',
18: 'mumbai',
19: 'No Entry',
20: 'No Entry'},
'derived_symbol': {0: 'football.RAM.mumbai.ram_count',
1: 'football.RAM.mumbai.mum_count',
2: 'football.RAM.delhi.mum_count',
3: 'football.RAM.delhi.ram_count',
4: 'football.RAM.mumbai.ram_count',
5: 'football.RAM.mumbai.mum_count',
6: 'football.RAM.delhi.mum_count',
7: 'football.RAM.delhi.ram_count',
8: 'basketball.MAH.pune.mah_count',
9: 'basketball.MAH.nagpur.mah_count',
10: 'basketball.MAH.TOTAL.mah_count',
11: 'basketball.MAH.TOTAL.nagpur_count',
12: 'basketball.MAH.TOTAL.pune_count',
13: 'football.RAM.TOTAL.delhi_count',
14: 'football.RAM.TOTAL.delhi_count',
15: 'football.RAM.TOTAL.mum_count',
16: 'football.RAM.TOTAL.mum_count',
17: 'football.RAM.TOTAL.mumbai_count',
18: 'football.RAM.TOTAL.mumbai_count',
19: 'football.RAM.TOTAL.ram_count',
20: 'football.RAM.TOTAL.ram_count'},
'month': {0: '2017-01-23',
1: '2017-01-23',
2: '2017-01-23',
3: '2017-01-23',
4: '2017-02-22',
5: '2017-02-22',
6: '2017-02-22',
7: '2017-02-22',
8: '2017-03-03',
9: '2017-03-03',
10: '2017-03-03',
11: '2017-03-03',
12: '2017-03-03',
13: '2017-01-23',
14: '2017-02-22',
15: '2017-01-23',
16: '2017-02-22',
17: '2017-01-23',
18: '2017-02-22',
19: '2017-01-23',
20: '2017-02-22'},
'person_count': {0: 10,
1: 14,
2: 25,
3: 20,
4: 34,
5: 23,
6: 43,
7: 34,
8: 10,
9: 20,
10: 30,
11: 20,
12: 10,
13: 45,
14: 77,
15: 39,
16: 66,
17: 24,
18: 57,
19: 30,
20: 68},
'person_name': {0: 'RAM',
1: 'RAM',
2: 'RAM',
3: 'RAM',
4: 'RAM',
5: 'RAM',
6: 'RAM',
7: 'RAM',
8: 'MAH',
9: 'MAH',
10: 'MAH',
11: 'MAH',
12: 'MAH',
13: 'RAM',
14: 'RAM',
15: 'RAM',
16: 'RAM',
17: 'RAM',
18: 'RAM',
19: 'RAM',
20: 'RAM'},
'person_symbol': {0: 'ram',
1: 'mum',
2: 'mum',
3: 'ram',
4: 'ram',
5: 'mum',
6: 'mum',
7: 'ram',
8: 'mah',
9: 'mah',
10: 'mah',
11: 'No Entry',
12: 'No Entry',
13: 'No Entry',
14: 'No Entry',
15: 'mum',
16: 'mum',
17: 'No Entry',
18: 'No Entry',
19: 'ram',
20: 'ram'},
'sir': {0: 'a',
1: 'a',
2: 'a',
3: 'a',
4: 'b',
5: 'b',
6: 'b',
7: 'b',
8: 'c',
9: 'c',
10: 'c',
11: 'c',
12: 'c',
13: 'a',
14: 'b',
15: 'a',
16: 'b',
17: 'a',
18: 'b',
19: 'a',
20: 'b'},
'sport_name': {0: 'football',
1: 'football',
2: 'football',
3: 'football',
4: 'football',
5: 'football',
6: 'football',
7: 'football',
8: 'basketball',
9: 'basketball',
10: 'basketball',
11: 'basketball',
12: 'basketball',
13: 'football',
14: 'football',
15: 'football',
16: 'football',
17: 'football',
18: 'football',
19: 'football',
20: 'football'}}
@ 3kt-謝謝你的幫助。我有一個問題,我們可以在幾周內做同樣的事情嗎?如果數據每週都可用。我們可以根據該數據計算前一週的數量嗎? – kit
@ 3kt-我做到了。謝謝 – kit