2017-03-31 72 views
1

給出這個熊貓數據框有兩列,'值'和'間隔'。如何獲得第三列'MinMax',指示該值在該間隔內是最大值還是最小值?我面臨的挑戰是間隔長度和間隔之間的距離不固定,因此我發佈了這個問題。在一列的間隔內找出最高值和最低值的位置?

import pandas as pd 
import numpy as np 


data = pd.DataFrame([ 
     [1879.289,np.nan],[1879.281,np.nan],[1879.292,1],[1879.295,1],[1879.481,1],[1879.294,1],[1879.268,1], 
     [1879.293,1],[1879.277,1],[1879.285,1],[1879.464,1],[1879.475,1],[1879.971,1],[1879.779,1], 
     [1879.986,1],[1880.791,1],[1880.29,1],[1879.253,np.nan],[1878.268,np.nan],[1875.73,1],[1876.792,1], 
     [1875.977,1],[1876.408,1],[1877.159,1],[1877.187,1],[1883.164,1],[1883.171,1],[1883.495,1], 
     [1883.962,1],[1885.158,1],[1885.974,1],[1886.479,np.nan],[1885.969,np.nan],[1884.693,1],[1884.977,1], 
     [1884.967,1],[1884.691,1],[1886.171,1],[1886.166,np.nan],[1884.476,np.nan],[1884.66,1],[1882.962,1], 
     [1881.496,1],[1871.163,1],[1874.985,1],[1874.979,1],[1871.173,np.nan],[1871.973,np.nan],[1871.682,np.nan], 
     [1872.476,np.nan],[1882.361,1],[1880.869,1],[1882.165,1],[1881.857,1],[1880.375,1],[1880.66,1], 
     [1880.891,1],[1880.377,1],[1881.663,1],[1881.66,1],[1877.888,1],[1875.69,1],[1875.161,1], 
     [1876.697,np.nan],[1876.671,np.nan],[1879.666,np.nan],[1877.182,np.nan],[1878.898,1],[1878.668,1],[1878.871,1], 
     [1878.882,1],[1879.173,1],[1878.887,1],[1878.68,1],[1878.872,1],[1878.677,1],[1877.877,1], 
     [1877.669,1],[1877.69,1],[1877.684,1],[1877.68,1],[1877.885,1],[1877.863,1],[1877.674,1], 
     [1877.676,1],[1877.687,1],[1878.367,1],[1878.179,1],[1877.696,1],[1877.665,1],[1877.667,np.nan], 
     [1878.678,np.nan],[1878.661,1],[1878.171,1],[1877.371,1],[1877.359,1],[1878.381,1],[1875.185,1], 
     [1875.367,np.nan],[1865.492,np.nan],[1865.495,1],[1866.995,1],[1866.672,1],[1867.465,1],[1867.663,1], 
     [1867.186,1],[1867.687,1],[1867.459,1],[1867.168,1],[1869.689,1],[1869.693,1],[1871.676,1], 
     [1873.174,1],[1873.691,np.nan],[1873.685,np.nan] 
    ]) 

在第三列中可以看到,其中最大和最小爲每一個間隔。

+-------+----------+-----------+---------+ 
| index | Value | Intervals | Min/Max | 
+-------+----------+-----------+---------+ 
|  0 | 1879.289 | np.nan |   | 
|  1 | 1879.281 | np.nan |   | 
|  2 | 1879.292 | 1   |   | 
|  3 | 1879.295 | 1   |   | 
|  4 | 1879.481 | 1   |   | 
|  5 | 1879.294 | 1   |   | 
|  6 | 1879.268 | 1   | min  | 
|  7 | 1879.293 | 1   |   | 
|  8 | 1879.277 | 1   |   | 
|  9 | 1879.285 | 1   |   | 
| 10 | 1879.464 | 1   |   | 
| 11 | 1879.475 | 1   |   | 
| 12 | 1879.971 | 1   |   | 
| 13 | 1879.779 | 1   |   | 
| 17 | 1879.986 | 1   |   | 
| 18 | 1880.791 | 1   | max  | 
| 19 | 1880.29 | 1   |   | 
| 55 | 1879.253 | np.nan |   | 
| 56 | 1878.268 | np.nan |   | 
| 57 | 1875.73 | 1   |   | 
| 58 | 1876.792 | 1   |   | 
| 59 | 1875.977 | 1   | min  | 
| 60 | 1876.408 | 1   |   | 
| 61 | 1877.159 | 1   |   | 
| 62 | 1877.187 | 1   |   | 
| 63 | 1883.164 | 1   |   | 
| 64 | 1883.171 | 1   |   | 
| 65 | 1883.495 | 1   |   | 
| 66 | 1883.962 | 1   |   | 
| 67 | 1885.158 | 1   |   | 
| 68 | 1885.974 | 1   | max  | 
| 69 | 1886.479 | np.nan |   | 
| 70 | 1885.969 | np.nan |   | 
| 71 | 1884.693 | 1   |   | 
| 72 | 1884.977 | 1   |   | 
| 73 | 1884.967 | 1   |   | 
| 74 | 1884.691 | 1   | min  | 
| 75 | 1886.171 | 1   | max  | 
| 76 | 1886.166 | np.nan |   | 
| 77 | 1884.476 | np.nan |   | 
| 78 | 1884.66 | 1   | max  | 
| 79 | 1882.962 | 1   |   | 
| 80 | 1881.496 | 1   |   | 
| 81 | 1871.163 | 1   | min  | 
| 82 | 1874.985 | 1   |   | 
| 83 | 1874.979 | 1   |   | 
| 84 | 1871.173 | np.nan |   | 
| 85 | 1871.973 | np.nan |   | 
| 86 | 1871.682 | np.nan |   | 
| 87 | 1872.476 | np.nan |   | 
| 88 | 1882.361 | 1   | max  | 
| 89 | 1880.869 | 1   |   | 
| 90 | 1882.165 | 1   |   | 
| 91 | 1881.857 | 1   |   | 
| 92 | 1880.375 | 1   |   | 
| 93 | 1880.66 | 1   |   | 
| 94 | 1880.891 | 1   |   | 
| 95 | 1880.377 | 1   |   | 
| 96 | 1881.663 | 1   |   | 
| 97 | 1881.66 | 1   |   | 
| 98 | 1877.888 | 1   |   | 
| 99 | 1875.69 | 1   |   | 
| 100 | 1875.161 | 1   | min  | 
| 101 | 1876.697 | np.nan |   | 
| 102 | 1876.671 | np.nan |   | 
| 103 | 1879.666 | np.nan |   | 
| 111 | 1877.182 | np.nan |   | 
| 112 | 1878.898 | 1   |   | 
| 113 | 1878.668 | 1   |   | 
| 114 | 1878.871 | 1   |   | 
| 115 | 1878.882 | 1   |   | 
| 116 | 1879.173 | 1   | max  | 
| 117 | 1878.887 | 1   |   | 
| 118 | 1878.68 | 1   |   | 
| 119 | 1878.872 | 1   |   | 
| 120 | 1878.677 | 1   |   | 
| 121 | 1877.877 | 1   |   | 
| 122 | 1877.669 | 1   |   | 
| 123 | 1877.69 | 1   |   | 
| 124 | 1877.684 | 1   |   | 
| 125 | 1877.68 | 1   |   | 
| 126 | 1877.885 | 1   |   | 
| 127 | 1877.863 | 1   |   | 
| 128 | 1877.674 | 1   |   | 
| 129 | 1877.676 | 1   |   | 
| 130 | 1877.687 | 1   |   | 
| 131 | 1878.367 | 1   |   | 
| 132 | 1878.179 | 1   |   | 
| 133 | 1877.696 | 1   |   | 
| 134 | 1877.665 | 1   | min  | 
| 135 | 1877.667 | np.nan |   | 
| 136 | 1878.678 | np.nan |   | 
| 137 | 1878.661 | 1   | max  | 
| 138 | 1878.171 | 1   |   | 
| 139 | 1877.371 | 1   |   | 
| 140 | 1877.359 | 1   |   | 
| 141 | 1878.381 | 1   |   | 
| 142 | 1875.185 | 1   | min  | 
| 143 | 1875.367 | np.nan |   | 
| 144 | 1865.492 | np.nan |   | 
| 145 | 1865.495 | 1   | max  | 
| 146 | 1866.995 | 1   |   | 
| 147 | 1866.672 | 1   |   | 
| 148 | 1867.465 | 1   |   | 
| 149 | 1867.663 | 1   |   | 
| 150 | 1867.186 | 1   |   | 
| 151 | 1867.687 | 1   |   | 
| 152 | 1867.459 | 1   |   | 
| 153 | 1867.168 | 1   |   | 
| 154 | 1869.689 | 1   |   | 
| 155 | 1869.693 | 1   |   | 
| 156 | 1871.676 | 1   |   | 
| 157 | 1873.174 | 1   | min  | 
| 158 | 1873.691 | np.nan |   | 
| 159 | 1873.685 | np.nan |   | 
+-------+----------+-----------+---------+ 
+0

請讓您的數據重現性好,隨機種子數是最好的。你的'data = ...'這行很長(1024 char?),它會在我的shell中複製粘貼。 – smci

+0

我說複製並粘貼該行會炸燬我的外殼;該行太長:1024個字符或更多。該行崩潰我的外殼。這就是爲什麼我建議你使用隨機播種的數字。 – smci

+0

對不起,我發現後,我得到了這個意見:) – RaduS

回答

2
isnull = data.iloc[:, 1].isnull() 
minmax = data.groupby(isnull.cumsum()[~isnull])[0].agg(['idxmax', 'idxmin']) 
data.loc[minmax['idxmax'], 'MinMax'] = 'max' 
data.loc[minmax['idxmin'], 'MinMax'] = 'min' 
data.MinMax = data.MinMax.fillna('') 
print(data) 

      0 1 MinMax 
0 1879.289 NaN  
1 1879.281 NaN  
2 1879.292 1.0  
3 1879.295 1.0  
4 1879.481 1.0  
5 1879.294 1.0  
6 1879.268 1.0 min 
7 1879.293 1.0  
8 1879.277 1.0  
9 1879.285 1.0  
10 1879.464 1.0  
11 1879.475 1.0  
12 1879.971 1.0  
13 1879.779 1.0  
14 1879.986 1.0  
15 1880.791 1.0 max 
16 1880.290 1.0  
17 1879.253 NaN  
18 1878.268 NaN  
19 1875.730 1.0 min 
20 1876.792 1.0  
21 1875.977 1.0  
22 1876.408 1.0  
23 1877.159 1.0  
24 1877.187 1.0  
25 1883.164 1.0  
26 1883.171 1.0  
27 1883.495 1.0  
28 1883.962 1.0  
29 1885.158 1.0  
..  ... ... ... 
85 1877.687 1.0  
86 1878.367 1.0  
87 1878.179 1.0  
88 1877.696 1.0  
89 1877.665 1.0 min 
90 1877.667 NaN  
91 1878.678 NaN  
92 1878.661 1.0 max 
93 1878.171 1.0  
94 1877.371 1.0  
95 1877.359 1.0  
96 1878.381 1.0  
97 1875.185 1.0 min 
98 1875.367 NaN  
99 1865.492 NaN  
100 1865.495 1.0 min 
101 1866.995 1.0  
102 1866.672 1.0  
103 1867.465 1.0  
104 1867.663 1.0  
105 1867.186 1.0  
106 1867.687 1.0  
107 1867.459 1.0  
108 1867.168 1.0  
109 1869.689 1.0  
110 1869.693 1.0  
111 1871.676 1.0  
112 1873.174 1.0 max 
113 1873.691 NaN  
114 1873.685 NaN  

[115 rows x 3 columns] 
1
data.columns=['Value','Interval'] 

data['Ingroup'] = (data['Interval'].notnull() + 0) 

Use data['Interval'].notnull() to separate the groups... 
Use cumsum() to number them with `groupno`... 
Use groupby(groupno).. 

Finally you want something using apply/idxmax/idxmin to label the max/min 

But of course a for-loop as you suggested is the non-Pythonic but possibly simpler hack. 
+0

,所以我想要的是在「間隔」列中的每個間隔(可以介於1和3之間或介於-1和3之間)以在單獨的列中獲取列「值」中的最大值和最小值。這有助於更好地理解嗎? – RaduS

+0

我現在更改了數據並將所有值都替換爲1.這更簡單。你可以看看它。也許它引發了一些想法,以獲得最小和最大 – RaduS

+1

謝謝你的建議和指導:)非常感謝。如果我可以,我會接受這兩個答案作爲正確的;) – RaduS

相關問題