樣本量是1是否考慮油藏採樣？

-1

我只是想知道我的代碼是油藏採樣。我有一串我只想處理的瀏覽量。我一次處理一個綜合瀏覽量。但是，由於大部分綜合瀏覽量是相同的，所以我只是想隨機選擇任何網頁瀏覽量（一次一個來處理）。例如，我的綜合瀏覽量爲樣本量是1是否考慮油藏採樣？

[www.example.com, www.example.com, www.example1.com, www.example3.com, ...]

我一次處理一個元素。這是我的代碼。

import random 

def __init__(self): 
    self.counter = 0 

def processable(): 
    self.counter += 1 
    return random.random() < 1.0/self.counter

來源

2017-02-04 toy

該代碼沒有任何意義。你有沒有在某處定義的類？你似乎並沒有與一系列物品互動。 – Blckknght

該代碼只是代碼庫的一部分。我將發佈它與流進行交互的部分。 – toy

算法爲水庫取樣關注中（可以在這裏找到：https://en.wikipedia.org/wiki/Reservoir_sampling），我們只存儲一個瀏覽量（水庫大小= 1），下面的實現表明，如何概率選擇從流瀏覽量戰略導致均勻的選擇概率：

import numpy as np 
import matplotlib.pyplot as plt 
max_num = 10 # maximum number of pageviews we want to consider 
# replicate the experiment ntrials times and find the probability for selection of any pageview 
pageview_indices = [] 
ntrials = 10000 
for _ in range(ntrials): 
    pageview_index = None # index of the single pageview to be kept 
    i = 0 
    while True: # streaming pageviews 
     i += 1 # next pageview 
     if i > max_num: 
      break 
     # keep first pageview and from next pageview onwards discard the old one kept with probability 1 - 1/i 
     pageview_index = 1 if i == 1 else np.random.choice([pageview_index, i], 1, p=[1-1./i, 1./i])[0] 
     #print 'pageview chosen:', pageview_index 
    print 'Final pageview chosen:', pageview_index 
    pageview_indices.append(pageview_index) 
plt.hist(pageview_indices, max_num, normed=1, facecolor='green', alpha=0.75) 
plt.xlabel('Pageview Index') 
plt.ylabel('Probability Chosen') 
plt.title('Reservoir Sampling') 
plt.axis([0, max_num+1, 0, 0.15]) 
plt.xticks(range(1, max_num+1)) 
plt.grid(True)

正如從上面可以看出，所選擇的網頁瀏覽索引的概率幾乎均勻的（1/10爲每個10個瀏覽量），它在數學上也可以證明是統一的。

來源

2017-02-05 14:30:28

只是有一個簡單的問題。這是否意味着採樣可以重複？ – toy

重複的你是指採樣實驗的複製嗎？我們不需要複製抽樣過程，我只是複製了這個過程，以實證證明一個數字是從具有一致概率的'n'數字流中選擇的，所以如果您用'n'數字重複實驗，那麼您預計將看到所有選定的數字幾乎相同的次數。 –

樣本量是1是否考慮油藏採樣？

回答

相關問題