2017-12-27 1346 views
2

我有一個數據幀,它有2列:genre和release_year。每年都有多種流派。格式如下:Python:按多列分組的值線圖

genre release_year 
Action 2015 
Action 2015 
Adventure 2015 
Action 2015 
Action 2015 

我需要使用Pandas/Python繪製所有類型的變化。

df = pd.read('genres.csv') 

df.shape 
(53975, 2) 


df_new = df.groupby(['release_year', 'genre'])['genre'].count() 

這會導致以下分組。

release_year genre   
1960  Action    8 
      Adventure   5 
      Comedy    8 
      Crime    2 
      Drama    13 
      Family    3 
      Fantasy    2 
      Foreign    1 
      History    5 
      Horror    7 
      Music    1 
      Romance    6 
      Science Fiction  3 
      Thriller    6 
      War     2 
      Western    6 
1961  Action    7 
      Adventure   6 
      Animation   1 
      Comedy    10 
      Crime    2 
      Drama    16 
      Family    5 
      Fantasy    2 
      Foreign    1 
      History    3 
      Horror    3 
      Music    2 
      Mystery    1 
      Romance    7 
          ... 

我需要爲多年來流派特徵的變化繪製線圖。即我必須有一個循環,這可以幫助我繪製多年來的各種流派。例如,

df_action = df.query('genre == "Action"') 
result_plot = df_action.groupby(['release_year','genre'])['genre'].count() 
result_plot.plot(figsize=(10,10)); 

顯示類型「行動」的情節。同樣,而不是分別繪製每個流派我需要有一個相同的循環。

我該怎麼做?任何人都可以幫助我嗎?

我試過以下,但它不起作用。

genres = ["Action", "Adventure", "Western", "Science Fiction", "Drama", 
    "Family", "Comedy", "Crime", "Romance", "War", "Mystery", 
    "Thriller", "Fantasy", "History", "Animation", "Horror", "Music", 
    "Documentary", "TV Movie", "Foreign"] 

for g in genres: 
    #df_new = df.query('genre == "g"') 
    result_plot = df.groupby(['release_year','genre'])['genre'].count() 
    result_plot.plot(figsize=(10,10)); 

回答

2

怎麼樣開拆你的串聯和一個命令繪製的一切:

In [36]: s 
Out[36]: 
release_year genre 
1960.0  Action  8 
       Adventure  5 
       Comedy  8 
       Crime   2 
       Drama  13 
       Family  3 
       Fantasy  2 
       Foreign  1 
       History  5 
       Horror  7 
          .. 
1961.0  Crime   2 
       Drama  16 
       Family  5 
       Fantasy  2 
       Foreign  1 
       History  3 
       Horror  3 
       Music   2 
       Mystery  1 
       Romance  7 
Name: count, Length: 30, dtype: int64 

In [37]: s.unstack() 
Out[37]: 
genre   Action Adventure Animation Comedy Crime Drama Family Fantasy Foreign History Horror Music Mystery Romance \ 
release_year 
1960.0   8.0  5.0  NaN  8.0 2.0 13.0  3.0  2.0  1.0  5.0  7.0 1.0  NaN  6.0 
1961.0   7.0  6.0  1.0 10.0 2.0 16.0  5.0  2.0  1.0  3.0  3.0 2.0  1.0  7.0 

genre   Science Fiction Thriller War Western 
release_year 
1960.0     3.0  6.0 2.0  6.0 
1961.0     NaN  NaN NaN  NaN 

繪圖:

s.unstack().plot() 
2
df_new.unstack().T.plot(kind='bar') 

我選擇柱狀圖中,你可以改變你需要what ever

PS:你可以考慮crosstab而不是groupby

pd.crosstab(df.genre,df.release_year).plot(kind='bar') 

enter image description here

0

我推薦使用seaborn這將有助於避免數據幀的處理繪圖之前。您可以通過運行pip install seaborn來安裝它。它有標準的各種情節的簡單API:

RELEASE_YEAR VS流派

import seaborn as sns 
sns.countplot(x='release_year', hue='genre', data=df) 

release_year vs genre

流派VS RELEASE_YEAR

import seaborn as sns 
sns.countplot(x='genre', hue='release_year', data=df) 

genre vs release_year