我儘可能簡化了代碼,但它仍然很長,它應該說明問題。從一個熊貓數據幀中取樣在一個循環中不重複
我從數據幀採樣天氣數據:
import numpy as np
import pandas as pd
#dataframe
dates = pd.date_range('19510101',periods=16000)
data = pd.DataFrame(data=np.random.randint(0,100,(16000,1)), columns =list('A'))
data['date'] = dates
data = data[['date','A']]
#create year and season column
def get_season(row):
if row['date'].month >= 3 and row['date'].month <= 5:
return '2'
elif row['date'].month >= 6 and row['date'].month <= 8:
return '3'
elif row['date'].month >= 9 and row['date'].month <= 11:
return '4'
else:
return '1'
data['Season'] = data.apply(get_season, axis=1)
data['Year'] = data['date'].dt.year
我想用預定年/季元組選擇一個隨機的一年:
#generate an index of year and season tuples
index = [(1951L, '1'),
(1951L, '2'),
(1952L, '4'),
(1954L, '3'),
(1955L, '1'),
(1955L, '2'),
(1956L, '3'),
(1960L, '4'),
(1961L, '3'),
(1962L, '2'),
(1962L, '3'),
(1979L, '2'),
(1979L, '3'),
(1980L, '4'),
(1983L, '2'),
(1984L, '2'),
(1984L, '4'),
(1985L, '3'),
(1986L, '1'),
(1986L, '2'),
(1986L, '3'),
(1987L, '4'),
(1991L, '1'),
(1992L, '4')]
和樣品從這個通過以下方式:
每個季節生成4個列表(年份爲春季,夏季等)
coldsample = [[],[],[],[]] #empty list of lists
for (yr,se) in index:
coldsample[int(se)-1] += [yr] #function which gives the years which have extreme seasons [[1],[2],[3],[4]]
coldsample
隨機選擇一個一年從這個名單
cold_ctr = 0 #variable to count from (1 is winter, 2 spring, 3 summer, 4 autumn)
coldseq = [] #blank list
for yrlist in coldsample:
ran_yr = np.random.choice(yrlist, 1) #choose a randomly sampled year from previous cell
cold_ctr += 1 # increment cold_ctr variable by 1
coldseq += [(ran_yr[0], cold_ctr)] #populate coldseq with a random year and a random season (in order)
然後生成其中選擇多個隨機年
df = []
for i in range (5): #change the number here to change the number of output years
for item in coldseq: #item is a tuple with year and season, coldseq is cold year and season pairs
df.append(data.query("Year == %d and Season == '%d'" % item))
一個新的數據幀的問題是,這個從coldseq
選擇(具有同年/季節組合),並且不會生成新的coldseq。我需要將coldseq重置爲空,併爲最終for循環的每次迭代生成一個新的,但無法看到這樣做的方式。我試過用多種方式在循環中嵌入代碼,但似乎並不奏效。
很抱歉,但這個沒有做什麼,我尋找,我想產生一個新的符合季節性順序(春季,夏季,秋季,冬季)的數據框。 – Pad
從隨機樣本中創建新數據框後,如何訂購新數據框有什麼不對? –
對不起,但這並不是以我需要的方式從一個序列中選擇 - 我已經用解決方案回答了帖子 – Pad