將數據抽樣爲兩個組

我正在尋求幫助，使代碼低於效率。儘管它有效，但我並不滿意。有bug被修復（目前不相關）。我第一次使用< random>頭，第一次使用stable_partition。將數據抽樣爲兩個組

問題定義/規格：
我有一個數字數據（浮點值）的人口（向量）。我想根據用戶指定的百分比創建兩個RANDOM樣本（2個向量）。即popu_data = 30％Sample1 + 70％Sample2 - 這裏30％將由用戶給出。我沒有實現％，但它的微不足道。

編程中的問題：我能夠從羣體中創建30％的樣本。創建另一個矢量的第二部分（樣本2-70％）是我的問題。選擇30％數據的原因是，我必須隨機選擇數值。我必須跟蹤索引來刪除它們。但是，有些我沒有得到比我實施的邏輯高效的邏輯。

我的邏輯是（不開心）：在人口數據中，隨機索引值被替換爲唯一值（這裏是0.5555）。後來我瞭解了stable_partition函數，其中人口的各個值與0.5555進行比較。在錯誤的情況下，該數據將作爲補充sample1的新Sample2創建。

此外：我怎樣才能使這個通用即一個人口納入用戶定義的人口百分比的N個子樣本。

謝謝你的幫助。我嘗試了矢量擦除，刪除，複製等，但沒有實現當前的代碼。我正在尋找更好，更高效的邏輯和stl用法。

#include <random> 
#include <iostream> 
#include <vector> 
#include <algorithm> 

using namespace std; 

bool Is05555 (float i){ 
    if (i > 0.5560) return true; 
    return false; 
} 

int main() 
{ 
    random_device rd; 
    mt19937 gen(rd()); 
    uniform_real_distribution<> dis(1, 2); 
    vector<float>randVals; 

    cout<<"All the Random Values between 1 and 2"<<endl; 
    for (int n = 0; n < 20; ++n) { 
     float rnv = dis(gen); 
     cout<<rnv<<endl; 
     randVals.push_back(rnv); 
    } 
    cout << '\n'; 

    random_device rd2; 
    mt19937 gen2(rd2()); 
    uniform_int_distribution<int> dist(0,19); 

    vector<float>sample; 
    vector<float>sample2; 
    for (int n = 0; n < 6; ++n) { 
     float rnv = dist(gen2); 
     sample.push_back(randVals.at(rnv)); 
     randVals.at(rnv) = 0.5555; 
    } 

    cout<<"Random Values between 1 and 2 with 0.5555 a Unique VAlue"<<endl; 
    for (int n = 0; n < 20; ++n) { 
     cout<<randVals.at(n)<<" "; 
    } 
    cout << '\n'; 

    std::vector<float>::iterator bound; 
    bound = std::stable_partition (randVals.begin(), randVals.end(), Is05555); 

    for (std::vector<float>::iterator it=randVals.begin(); it!=bound; ++it) 
     sample2.push_back(*it); 

    cout<<sample.size()<<","<<sample2.size()<<endl; 

    cout<<"Random Values between 1 and 2 Subset of 6 only: "<<endl; 

    for (int n = 0; n < sample.size(); ++n) { 
     cout<<sample.at(n)<<" "; 
    } 
    cout << '\n'; 

    cout<<"Random Values between 1 and 2 - Remaining: "<<endl; 
    for (int n = 0; n < sample2.size(); ++n) { 
     cout<<sample2.at(n)<<" "; 
    } 
    cout << '\n'; 

    return 0; 
}

來源

2013-07-20 Prasad

算法函數set_difference可能會救我 - 只是看到彈出右側欄的功能。但是，似乎我必須在使用之前進行排序，這是不可信的。 – Prasad

對於你30％的樣本，你是否需要選擇30％概率的樣本（可能導致樣本大小*略微不同於30％）或恰好30％選擇的樣本？你是否需要原始訂單中的結果，或者是否與樣本訂單無關？ –

'vector sample; （int n = 0; n <6; ++ n）{ \t float rnv = dist（gen2）; \t sample.push_back（randVals.at（rnv））; } sort（randVals.begin（），randVals.end（））; sort（sample.begin（），sample.end（））; vector sample2; （），（））;} **;使用set_difference的代碼 - 它的工作原理*使用set_difference的代碼 - 它的工作原理* * – Prasad

鑑於一個N％的樣品的要求，與順序無關，它可能是最簡單的，只是這樣做：

std::random_shuffle(randVals.begin(), randVals.end()); 
int num = randVals.size() * percent/100.0; 

auto pos = randVals.begin() + randVals.size() - num; 

// get our sample 
auto sample1{pos, randVals.end()}; 

// remove sample from original collection 
randVals.erase(pos, randVals.end());

對於某些類型的數組中的項目，可以通過提高該將項目從原始數組移動到樣本數組，但對於簡單類型如float或double，則無法完成任何操作。

來源

2013-07-20 22:47:45

謝謝。尋找清晰和高效的代碼。無論我們選擇隨機（一個我實施 - 長和骯髒的）或隨機洗牌，並從餅（清脆有效）連續一塊 - 謝謝。 – Prasad

按照此公告：http://stackoverflow.com/questions/13459953/random-shuffle-not-really-random?rq=1，我想我應該用上面你提到的5條線路srand函數????謝謝。！的std ::函數srand（標準::時間（0））; – Prasad

@Prasad：是的... –

將數據抽樣爲兩個組

回答

相關問題