2014-01-09 50 views
0

作爲練習,我想用EC2/python-boto來做一些科學的計算和蒙特卡洛模擬。使用python-boto和EC2來蠻力

例如假設我有幾個小計,如:

a = [53067, 
    45412, 
    35238, 
    34972, 
    31551, 
    29258, 
    28550, 
    28044, 
    25485, 
    21905, 
    21597, 
    21403, 
    20536, 
    20013, 
    18338, 
    17832, 
    17186, 
    16416, 
    14682, 
    14595] 

,然後值的鋸齒狀列表,它代表的月度數據(12個月,每月20值):

b = [[7043,4567,4386,4247,4426,4562,4107,2986,4022,4733,4738,3295], 
    [6090,4396,4382,4201,4409,3960,3315,2342,3034,3762,3858,2597], 
    [4445,3525,3432,3338,3396,3134,2774,2205,2909,3682,3415,2457], 
    [3998,3037,3203,2952,3122,2743,2564,2165,2904,3217,3324,2141], 
    [3438,2762,2975,2817,2401,2489,2479,1975,2811,3107,2862,2130], 
    [3027,2588,2865,2376,2392,2326,2327,1911,2383,2918,2646,2078], 
    [2878,2323,2861,2294,2289,2206,2179,1863,2340,2829,2312,1560], 
    [2862,2289,2853,2258,2256,2142,2021,1653,2164,2705,2308,1470], 
    [2727,2046,2452,1972,2214,2117,1868,1569,2098,2436,2284,1462], 
    [2664,2007,2005,1970,2145,1799,1825,1482,1971,2285,2053,1417], 
    [2575,1987,1972,1865,1808,1780,1822,1391,1792,2161,1962,1411], 
    [2417,1979,1957,1675,1783,1778,1795,1334,1767,2057,1928,1396], 
    [2225,1860,1774,1631,1743,1762,1713,1315,1762,1921,1732,1391], 
    [2152,1700,1760,1624,1722,1489,1694,1228,1722,1790,1648,1315], 
    [2053,1621,1740,1533,1618,1445,1440,1119,1377,1598,1585,1299], 
    [2033,1485,1607,1422,1469,1273,1415,1036,1314,1547,1534,1286], 
    [1887,1452,1478,1361,1434,1265,1410,994,1194,1437,1482,1248], 
    [1865,1357,1475,1274,1297,1210,1285,977,1060,1432,1470,1119], 
    [1686,1276,1421,1224,1218,993,1128,877,1020,1419,1323,1013], 
    [1536,1184,1405,1169,1211,938,1089,785,960,1299,1224,979]] 

我想用boto強行解決以下問題:對於a的每個值,找到一個通過月份的路徑,這將在a中給出小計。

例如第一個值53067按以下方法計算:

[7043, 4567, 4382, 4202, 4426, 4563, 4108, 2987, 4023, 4733, 4739, 3295] 

這是一個相當困難的問題實際上是解決所有小計,這是爲什麼我想借此機會學習如何蠻力強迫它使用EC2,因爲我可以重複使用它來進行蒙特卡洛模擬。

我發現很多關於如何連接和存儲數據的教程,但我沒有看到任何有關新人可以理解的分佈式計算。

回答

1

您可能想要使用EMR(Elastic Map Reduce)來運行您的計算(本質上是hadoop)。如果作業需要大量處理能力,則可以使用boto創建相當大小的羣集。

然後你可以用python編寫你的工作(假設你正在使用python)。有許多關於如何在python中編寫縮減地圖作業的教程。例如:http://www.michael-noll.com/tutorials/writing-an-hadoop-mapreduce-program-in-python/

之後,您可以將您的python作業發送到剛剛創建的EMR集羣並等待結果。

希望這會有所幫助。