2016-07-05 72 views
1

我儘可能簡化了代碼,但它仍然很長,它應該說明問題。從一個熊貓數據幀中取樣在一個循環中不重複

我從數據幀採樣天氣數據:

import numpy as np 
import pandas as pd 

#dataframe 
dates = pd.date_range('19510101',periods=16000) 
data = pd.DataFrame(data=np.random.randint(0,100,(16000,1)), columns =list('A')) 
data['date'] = dates 
data = data[['date','A']] 

#create year and season column 
def get_season(row): 
    if row['date'].month >= 3 and row['date'].month <= 5: 
     return '2' 
    elif row['date'].month >= 6 and row['date'].month <= 8: 
     return '3' 
    elif row['date'].month >= 9 and row['date'].month <= 11: 
     return '4' 
    else: 
     return '1' 

data['Season'] = data.apply(get_season, axis=1) 
data['Year'] = data['date'].dt.year 

我想用預定年/季元組選擇一個隨機的一年:

#generate an index of year and season tuples 
index = [(1951L, '1'), 
(1951L, '2'), 
(1952L, '4'), 
(1954L, '3'), 
(1955L, '1'), 
(1955L, '2'), 
(1956L, '3'), 
(1960L, '4'), 
(1961L, '3'), 
(1962L, '2'), 
(1962L, '3'), 
(1979L, '2'), 
(1979L, '3'), 
(1980L, '4'), 
(1983L, '2'), 
(1984L, '2'), 
(1984L, '4'), 
(1985L, '3'), 
(1986L, '1'), 
(1986L, '2'), 
(1986L, '3'), 
(1987L, '4'), 
(1991L, '1'), 
(1992L, '4')] 

和樣品從這個通過以下方式:

每個季節生成4個列表(年份爲春季,夏季等)

coldsample = [[],[],[],[]] #empty list of lists 
for (yr,se) in index: 
    coldsample[int(se)-1] += [yr] #function which gives the years which have extreme seasons [[1],[2],[3],[4]] 
coldsample 

隨機選擇一個一年從這個名單

cold_ctr = 0 #variable to count from (1 is winter, 2 spring, 3 summer, 4 autumn) 
coldseq = [] #blank list 
for yrlist in coldsample: 
     ran_yr = np.random.choice(yrlist, 1) #choose a randomly sampled year from previous cell 
     cold_ctr += 1 # increment cold_ctr variable by 1 
     coldseq += [(ran_yr[0], cold_ctr)] #populate coldseq with a random year and a random season (in order) 

然後生成其中選擇多個隨機年

df = [] 
for i in range (5): #change the number here to change the number of output years 
    for item in coldseq: #item is a tuple with year and season, coldseq is cold year and season pairs 
     df.append(data.query("Year == %d and Season == '%d'" % item)) 

一個新的數據幀的問題是,這個從coldseq選擇(具有同年/季節組合),並且不會生成新的coldseq。我需要將coldseq重置爲空,併爲最終for循環的每次迭代生成一個新的,但無法看到這樣做的方式。我試過用多種方式在循環中嵌入代碼,但似乎並不奏效。

回答

0

想通了,嵌入循環和內循環的計數器重置爲0:

cold_ctr = 0 #variable to count from (1 is winter, 2 spring, 3 summer, 4 autumn) 
coldseq = [] #blank list 

df = [] 
#number of cold years 
for i in range (5): #change number here for number of cold years 
    for yrlist in coldsample: 
     ran_yr = np.random.choice(yrlist, 1) #choose a randomly sampled year from previous cell 
     cold_ctr += 1 # increment cold_ctr variable by 1 
     coldseq += [(ran_yr[0], cold_ctr)] 
    for item in coldseq: #item is a tuple with year and season, coldseq is all extreme cold year and season pairs 
     df.append(data.query("Year == %d and Season == '%d'" % item)) 
     coldseq = [] #reset coldseq to an empty list so it samples from a new random year 
     cold_ctr = 0 #reset counter to 0 so seasons stay as 1,2,3,4 
0

您可以從索引創建第二個數據框,然後對其進行採樣。

df_index = pd.DataFrame(index) 
coldseq = df_index.sample(5) 

coldseq.apply(lambda x: df.append("Year == '{0}' and Season == '{1}'".format(x[0], x[1])), axis = 1) # or similar to append the query 
+0

很抱歉,但這個沒有做什麼,我尋找,我想產生一個新的符合季節性順序(春季,夏季,秋季,冬季)的數據框。 – Pad

+0

從隨機樣本中創建新數據框後,如何訂購新數據框有什麼不對? –

+0

對不起,但這並不是以我需要的方式從一個序列中選擇 - 我已經用解決方案回答了帖子 – Pad