2017-02-09 78 views
3

我是新的spark rdd,我想使用spark shuffling操作通過按鍵對它們進行分組來計算聚合。 起初,我的方法是使用rdd.groupby(),但在執行時需要較長的時間才能收斂並且內存非常有效,但我知道這種操作在混洗方面成本很高。 我碰到另一個操作rdd.combinebykey()但我在理解和使用它時遇到問題。通過在Python中使用combinebykey spark rdd來計算組中的聚合(pyspark)

這是存儲在RDD我的數據稱其爲

[(u'1', u'Customer#000000001', u'IVhzIApeRb ot,c,E', u'15', u'25-989-741-2988', u'711.56', u'BUILDING', u'to the even, regular platelets. regular, ironic epitaphs nag e', u''), (u'2', u'Customer#000000002', u'XSTf4,NCwDVaWNe6tEgvwfmRchLXak', u'13', u'23-768-687-3665', u'121.65', u'AUTOMOBILE', u'l accounts. blithely ironic theodolites integrate boldly: caref', u''), (u'3', u'Customer#000000003', u'MG9kdTD2WBHm', u'1', u'11-719-748-3364', u'7498.12', u'AUTOMOBILE', u' deposits eat slyly ironic, even instructions. express foxes detect slyly. blithely even accounts abov', u''), (u'4', u'Customer#000000004', u'XxVSJsLAGtn', u'4', u'14-128-190-5944', u'2866.83', u'MACHINERY', u' requests. final, regular ideas sleep final accou', u''), (u'5', u'Customer#000000005', u'KvpyuHCplrB84WgAiGV6sYpZq7Tj', u'3', u'13-750-942-6364', u'794.47', u'HOUSEHOLD', u'n accounts will have to unwind. foxes cajole accor', u''), (u'6', u'Customer#000000006', u'sKZz0CsnMD7mp4Xd0YrBvx,LREYKUWAh yVn', u'20', u'30-114-968-4951', u'7638.57', u'AUTOMOBILE', u'tions. even deposits boost according to the slyly bold packages. final accounts cajole requests. furious', u''), (u'7', u'Customer#000000007', u'TcGe5gaZNgVePxU5kRrvXBfkasDTea', u'18', u'28-190-982-9759', u'9561.95', u'AUTOMOBILE', u'ainst the ironic, express theodolites. express, even pinto beans among the exp', u''), (u'8', u'Customer#000000008', u'I0B10bB0AymmC, 0PrRYBCP1yGJ8xcBPmWhl5', u'17', u'27-147-574-9335', u'6819.74', u'BUILDING', u'among the slyly regular theodolites kindle blithely courts. carefully even theodolites haggle slyly along the ide', u''), (u'9', u'Customer#000000009', u'xKiAFTjUsCuxfeleNqefumTrjS', u'8', u'18-338-906-3675', u'8324.07', u'FURNITURE', u'r theodolites according to the requests wake thinly excuses: pending requests haggle furiousl', u''), (u'10', u'Customer#000000010', u'6LrEaV6KR6PLVcgl2ArL Q3rqzLzcT1 v2', u'5', u'15-741-346-9870', u'2753.54', u'HOUSEHOLD', u'es regular deposits haggle. fur', u''), (u'11', u'Customer#000000011', u'PkWS 3HlXqwTuzrKg633BEi', u'23', u'33-464-151-3439', u'-272.60', u'BUILDING', u'ckages. requests sleep slyly. quickly even pinto beans promise above the slyly regular pinto beans. ', u''), (u'12', u'Customer#000000012', u'9PWKuhzT4Zr1Q', u'13', u'23-791-276-1263', u'3396.49', u'HOUSEHOLD', u' to the carefully final braids. blithely regular requests nag. ironic theodolites boost quickly along', u''), (u'13', u'Customer#000000013', u'nsXQu0oVjD7PM659uC3SRSp', u'3', u'13-761-547-5974', u'3857.34', u'BUILDING', u'ounts sleep carefully after the close frays. carefully bold notornis use ironic requests. blithely', u''), (u'14', u'Customer#000000014', u'KXkletMlL2JQEA ', u'1', u'11-845-129-3851', u'5266.30', u'FURNITURE', u', ironic packages across the unus', u''), (u'15', u'Customer#000000015', u'YtWggXoOLdwdo7b0y,BZaGUQMLJMX1Y,EC,6Dn', u'23', u'33-687-542-7601', u'2788.52', u'HOUSEHOLD', u' platelets. regular deposits detect asymptotes. blithely unusual packages nag slyly at the fluf', u''), (u'16', u'Customer#000000016', u'cYiaeMLZSMAOQ2 d0W,', u'10', u'20-781-609-3107', u'4681.03', u'FURNITURE', u'kly silent courts. thinly regular theodolites sleep fluffily after ', u''), (u'17', u'Customer#000000017', u'izrh 6jdqtp2eqdtbkswDD8SG4SzXruMfIXyR7', u'2', u'12-970-682-3487', u'6.34', u'AUTOMOBILE', u'packages wake! blithely even pint', u''), (u'18', u'Customer#000000018', u'3txGO AiuFux3zT0Z9NYaFRnZt', u'6', u'16-155-215-1315', u'5494.43', u'BUILDING', u's sleep. carefully even instructions nag furiously alongside of t', u''), (u'19', u'Customer#000000019', u'uc,3bHIx84H,wdrmLOjVsiqXCq2tr', u'18', u'28-396-526-5053', u'8914.71', u'HOUSEHOLD', u' nag. furiously careful packages are slyly at the accounts. furiously regular in', u''), (u'20', u'Customer#000000020', u'JrPk8Pqplj4Ne', u'22', u'32-957-234-8742', u'7603.40', u'FURNITURE', u'g alongside of the special excuses-- fluffily enticing packages wake ', u''), (u'21', u'Customer#000000021', u'XYmVpr9yAHDEn', u'8', u'18-902-614-8344', u'1428.25', u'MACHINERY', u' quickly final accounts integrate blithely furiously u', u''), (u'22', u'Customer#000000022', u'QI6p41,FNs5k7RZoCCVPUTkUdYpB', u'3', u'13-806-545-9701', u'591.98', u'MACHINERY', u's nod furiously above the furiously ironic ideas. ', u''), (u'23', u'Customer#000000023', u'OdY W13N7Be3OC5MpgfmcYss0Wn6TKT', u'3', u'13-312-472-8245', u'3332.02', u'HOUSEHOLD', u'deposits. special deposits cajole slyly. fluffily special deposits about the furiously ', u''), (u'24', u'Customer#000000024', u'HXAFgIAyjxtdqwimt13Y3OZO 4xeLe7U8PqG', u'13', u'23-127-851-8031', u'9255.67', u'MACHINERY', u'into beans. fluffily final ideas haggle fluffily', u''), (u'25', u'Customer#000000025', u'Hp8GyFQgGHFYSilH5tBfe', u'12', u'22-603-468-3533', u'7133.70', u'FURNITURE', u'y. accounts sleep ruthlessly according to the regular theodolites. unusual instructions sleep. ironic, final', u''), (u'26', u'Customer#000000026', u'8ljrc5ZeMl7UciP', u'22', u'32-363-455-4837', u'5182.05', u'AUTOMOBILE', u'c requests use furiously ironic requests. slyly ironic dependencies us', u''), (u'27', u'Customer#000000027', u'IS8GIyxpBrLpMT0u7', u'3', u'13-137-193-2709', u'5679.84', u'BUILDING', u' about the carefully ironic pinto beans. accoun', u''), (u'28', u'Customer#000000028', u'iVyg0daQ,Tha8x2WPWA9m2529m', u'8', u'18-774-241-1462', u'1007.18', u'FURNITURE', u' along the regular deposits. furiously final pac', u''), (u'29', u'Customer#000000029', u'sJ5adtfyAkCK63df2,vF25zyQMVYE34uh', u'0', u'10-773-203-7342', u'7618.27', u'FURNITURE', u'its after the carefully final platelets x-ray against ', u''), (u'30', u'Customer#000000030', u'nJDsELGAavU63Jl0c5NKsKfL8rIJQQkQnYL2QJY', u'1', u'11-764-165-5076', u'9321.01', u'BUILDING', u'lithely final requests. furiously unusual account', u''), (u'31', u'Customer#000000031', u'LUACbO0viaAv6eXOAebryDB xjVst', u'23', u'33-197-837-7094', u'5236.89', u'HOUSEHOLD', u's use among the blithely pending depo', u''), (u'32', u'Customer#000000032', u'jD2xZzi UmId,DCtNBLXKj9q0Tlp2iQ6ZcO3J', u'15', u'25-430-914-2194', u'3471.53', u'BUILDING', u'cial ideas. final, furious requests across the e', u''), (u'33', u'Customer#000000033', u'qFSlMuLucBmx9xnn5ib2csWUweg D', u'17', u'27-375-391-1280', u'-78.56', u'AUTOMOBILE', u's. slyly regular accounts are furiously. carefully pending requests', u''), (u'34', u'Customer#000000034', u'Q6G9wZ6dnczmtOx509xgE,M2KV', u'15', u'25-344-968-5422', u'8589.70', u'HOUSEHOLD', u'nder against the even, pending accounts. even', u''), (u'35', u'Customer#000000035', u'TEjWGE4nBzJL2', u'17', u'27-566-888-7431', u'1228.24', u'HOUSEHOLD', u'requests. special, express requests nag slyly furiousl', u''), (u'36', u'Customer#000000036', u'3TvCzjuPzpJ0,DdJ8kW5U', u'21', u'31-704-669-5769', u'4987.27', u'BUILDING', u'haggle. enticing, quiet platelets grow quickly bold sheaves. carefully regular acc', u''), (u'37', u'Customer#000000037', u'7EV4Pwh,3SboctTWt', u'8', u'18-385-235-7162', u'-917.75', u'FURNITURE', u'ilent packages are carefully among the deposits. furiousl', u''), (u'38', u'Customer#000000038', u'a5Ee5e9568R8RLP 2ap7', u'12', u'22-306-880-7212', u'6345.11', u'HOUSEHOLD', u'lar excuses. closely even asymptotes cajole blithely excuses. carefully silent pinto beans sleep carefully fin', u''), (u'39', u'Customer#000000039', u'nnbRg,Pvy33dfkorYE FdeZ60', u'2', u'12-387-467-6509', u'6264.31', u'AUTOMOBILE', u'tions. slyly silent excuses slee', u''), (u'40', u'Customer#000000040', u'gOnGWAyhSV1ofv', u'3', u'13-652-915-8939', u'1335.30', u'BUILDING', u'rges impress after the slyly ironic courts. foxes are. blithely ', u''), (u'41', u'Customer#000000041', u'IM9mzmyoxeBmvNw8lA7G3Ydska2nkZF', u'10', u'20-917-711-4011', u'270.95', u'HOUSEHOLD', u'ly regular accounts hang bold, silent packages. unusual foxes haggle slyly above the special, final depo', u''), (u'42', u'Customer#000000042', u'ziSrvyyBke', u'5', u'15-416-330-4175', u'8727.01', u'BUILDING', u'ssly according to the pinto beans: carefully special requests across the even, pending accounts wake special', u''), (u'43', u'Customer#000000043', u'ouSbjHk8lh5fKX3zGso3ZSIj9Aa3PoaFd', u'19', u'29-316-665-2897', u'9904.28', u'MACHINERY', u'ial requests: carefully pending foxes detect quickly. carefully final courts cajole quickly. carefully', u''), (u'44', u'Customer#000000044', u'Oi,dOSPwDu4jo4x,,P85E0dmhZGvNtBwi', u'16', u'26-190-260-5375', u'7315.94', u'AUTOMOBILE', u'r requests around the unusual, bold a', u''), (u'45', u'Customer#000000045', u'4v3OcpFgoOmMG,CbnF,4mdC', u'9', u'19-715-298-9917', u'9983.38', u'AUTOMOBILE', u'nto beans haggle slyly alongside of t', u''), (u'46', u'Customer#000000046', u'eaTXWWm10L9', u'6', u'16-357-681-2007', u'5744.59', u'AUTOMOBILE', u'ctions. accounts sleep furiously even requests. regular, regular accounts cajole blithely around the final pa', u''), (u'47', u'Customer#000000047', u'b0UgocSqEW5 gdVbhNT', u'2', u'12-427-271-9466', u'274.58', u'BUILDING', u'ions. express, ironic instructions sleep furiously ironic ideas. furi', u''), (u'48', u'Customer#000000048', u'0UU iPhBupFvemNB', u'0', u'10-508-348-5882', u'3792.50', u'BUILDING', u're fluffily pending foxes. pending, bold platelets sleep slyly. even platelets cajo', u''), (u'49', u'Customer#000000049', u'cNgAeX7Fqrdf7HQN9EwjUa4nxT,68L FKAxzl', u'10', u'20-908-631-4424', u'4573.94', u'FURNITURE', u'nusual foxes! fluffily pending packages maintain to the regular ', u''), (u'50', u'Customer#000000050', u'9SzDYlkzxByyJ1QeTI o', u'6', u'16-658-112-3221', u'4266.13', u'MACHINERY', u'ts. furiously ironic accounts cajole furiously slyly ironic dinos.', u''), (u'51', u'Customer#000000051', u'uR,wEaiTvo4', u'12', u'22-344-885-4251', u'855.87', u'FURNITURE', u'eposits. furiously regular requests integrate carefully packages. furious', u''), (u'52', u'Customer#000000052', u'7 QOqGqqSy9jfV51BC71jcHJSD0', u'11', u'21-186-284-5998', u'5630.28', u'HOUSEHOLD', u'ic platelets use evenly even accounts. stealthy theodolites cajole furiou', u''), (u'53', u'Customer#000000053', u'HnaxHzTfFTZs8MuCpJyTbZ47Cm4wFOOgib', u'15', u'25-168-852-5363', u'4113.64', u'HOUSEHOLD', u'ar accounts are. even foxes are blithely. fluffily pending deposits boost', u''), (u'54', u'Customer#000000054', u',k4vf 5vECGWFy,hosTE,', u'4', u'14-776-370-4745', u'868.90', u'AUTOMOBILE', u'sual, silent accounts. furiously express accounts cajole special deposits. final, final accounts use furi', u''), (u'55', u'Customer#000000055', u'zIRBR4KNEl HzaiV3a i9n6elrxzDEh8r8pDom', u'10', u'20-180-440-8525', u'4572.11', u'MACHINERY', u'ully unusual packages wake bravely bold packages. unusual requests boost deposits! blithely ironic packages ab', u''), (u'56', u'Customer#000000056', u'BJYZYJQk4yD5B', u'10', u'20-895-685-6920', u'6530.86', u'FURNITURE', u'. notornis wake carefully. carefully fluffy requests are furiously even accounts. slyly expre', u''), (u'57', u'Customer#000000057', u'97XYbsuOPRXPWU', u'21', u'31-835-306-1650', u'4151.93', u'AUTOMOBILE', u'ove the carefully special packages. even, unusual deposits sleep slyly pend', u''), (u'58', u'Customer#000000058', u'g9ap7Dk1Sv9fcXEWjpMYpBZIRUohi T', u'13', u'23-244-493-2508', u'6478.46', u'HOUSEHOLD', u'ideas. ironic ideas affix furiously express, final instructions. regular excuses use quickly e', u''), (u'59', u'Customer#000000059', u'zLOCP0wh92OtBihgspOGl4', u'1', u'11-355-584-3112', u'3458.60', u'MACHINERY', u'ously final packages haggle blithely after the express deposits. furiou', u''), (u'60', u'Customer#000000060', u'FyodhjwMChsZmUz7Jz0H', u'12', u'22-480-575-5866', u'2741.87', u'MACHINERY', u'latelets. blithely unusual courts boost furiously about the packages. blithely final instruct', u''), (u'61', u'Customer#000000061', u'9kndve4EAJxhg3veF BfXr7AqOsT39o gtqjaYE', u'17', u'27-626-559-8599', u'1536.24', u'FURNITURE', u'egular packages shall have to impress along the ', u''), (u'62', u'Customer#000000062', u'upJK2Dnw13,', u'7', u'17-361-978-7059', u'595.61', u'MACHINERY', u'kly special dolphins. pinto beans are slyly. quickly regular accounts are furiously a', u''), (u'63', u'Customer#000000063', u'IXRSpVWWZraKII', u'21', u'31-952-552-9584', u'9331.13', u'AUTOMOBILE', u'ithely even accounts detect slyly above the fluffily ir', u''), (u'64', u'Customer#000000064', u'MbCeGY20kaKK3oalJD,OT', u'3', u'13-558-731-7204', u'-646.64', u'BUILDING', u'structions after the quietly ironic theodolites cajole be', u''), (u'65', u'Customer#000000065', u'RGT yzQ0y4l0H90P783LG4U95bXQFDRXbWa1sl,X', u'23', u'33-733-623-5267', u'8795.16', u'AUTOMOBILE', u'y final foxes serve carefully. theodolites are carefully. pending i', u''), (u'66', u'Customer#000000066', u'XbsEqXH1ETbJYYtA1A', u'22', u'32-213-373-5094', u'242.77', u'HOUSEHOLD', u'le slyly accounts. carefully silent packages benea', u''), (u'67', u'Customer#000000067', u'rfG0cOgtr5W8 xILkwp9fpCS8', u'9', u'19-403-114-4356', u'8166.59', u'MACHINERY', u'indle furiously final, even theodo', u''), (u'68', u'Customer#000000068', u'o8AibcCRkXvQFh8hF,7o', u'12', u'22-918-832-2411', u'6853.37', u'HOUSEHOLD', u' pending pinto beans impress realms. final dependencies ', u''), (u'69', u'Customer#000000069', u'Ltx17nO9Wwhtdbe9QZVxNgP98V7xW97uvSH1prEw', u'9', u'19-225-978-5670', u'1709.28', u'HOUSEHOLD', u'thely final ideas around the quickly final dependencies affix carefully quickly final theodolites. final accounts c', u''), (u'70', u'Customer#000000070', u'mFowIuhnHjp2GjCiYYavkW kUwOjIaTCQ', u'22', u'32-828-107-2832', u'4867.52', u'FURNITURE', u'fter the special asymptotes. ideas after the unusual frets cajole quickly regular pinto be', u''), (u'71', u'Customer#000000071', u'TlGalgdXWBmMV,6agLyWYDyIz9MKzcY8gl,w6t1B', u'7', u'17-710-812-5403', u'-611.19', u'HOUSEHOLD', u'g courts across the regular, final pinto beans are blithely pending ac', u''), (u'72', u'Customer#000000072', u'putjlmskxE,zs,HqeIA9Wqu7dhgH5BVCwDwHHcf', u'2', u'12-759-144-9689', u'-362.86', u'FURNITURE', u'ithely final foxes sleep always quickly bold accounts. final wat', u''), (u'73', u'Customer#000000073', u'8IhIxreu4Ug6tt5mog4', u'0', u'10-473-439-3214', u'4288.50', u'BUILDING', u'usual, unusual packages sleep busily along the furiou', u''), (u'74', u'Customer#000000074', u'IkJHCA3ZThF7qL7VKcrU nRLl,kylf ', u'4', u'14-199-862-7209', u'2764.43', u'MACHINERY', u'onic accounts. blithely slow packages would haggle carefully. qui', u''), (u'75', u'Customer#000000075', u'Dh 6jZ,cwxWLKQfRKkiGrzv6pm', u'18', u'28-247-803-9025', u'6684.10', u'AUTOMOBILE', u' instructions cajole even, even deposits. finally bold deposits use above the even pains. slyl', u''), (u'76', u'Customer#000000076', u'm3sbCvjMOHyaOofH,e UkGPtqc4', u'0', u'10-349-718-3044', u'5745.33', u'FURNITURE', u'pecial deposits. ironic ideas boost blithely according to the closely ironic theodolites! furiously final deposits n', u''), (u'77', u'Customer#000000077', u'4tAE5KdMFGD4byHtXF92vx', u'17', u'27-269-357-4674', u'1738.87', u'BUILDING', u'uffily silent requests. carefully ironic asymptotes among the ironic hockey players are carefully bli', u''), (u'78', u'Customer#000000078', u'HBOta,ZNqpg3U2cSL0kbrftkPwzX', u'9', u'19-960-700-9191', u'7136.97', u'FURNITURE', u'ests. blithely bold pinto beans h', u''), (u'79', u'Customer#000000079', u'n5hH2ftkVRwW8idtD,BmM2', u'15', u'25-147-850-4166', u'5121.28', u'MACHINERY', u'es. packages haggle furiously. regular, special requests poach after the quickly express ideas. blithely pending re', u''), (u'80', u'Customer#000000080', u'K,vtXp8qYB ', u'0', u'10-267-172-7101', u'7383.53', u'FURNITURE', u'tect among the dependencies. bold accounts engage closely even pinto beans. ca', u''), (u'81', u'Customer#000000081', u'SH6lPA7JiiNC6dNTrR', u'20', u'30-165-277-3269', u'2023.71', u'BUILDING', u'r packages. fluffily ironic requests cajole fluffily. ironically regular theodolit', u''), (u'82', u'Customer#000000082', u'zhG3EZbap4c992Gj3bK,3Ne,Xn', u'18', u'28-159-442-5305', u'9468.34', u'AUTOMOBILE', u's wake. bravely regular accounts are furiously. regula', u''), (u'83', u'Customer#000000083', u'HnhTNB5xpnSF20JBH4Ycs6psVnkC3RDf', u'22', u'32-817-154-4122', u'6463.51', u'BUILDING', u'ccording to the quickly bold warhorses. final, regular foxes integrate carefully. bold packages nag blithely ev', u''), (u'84', u'Customer#000000084', u'lpXz6Fwr9945rnbtMc8PlueilS1WmASr CB', u'11', u'21-546-818-3802', u'5174.71', u'FURNITURE', u'ly blithe foxes. special asymptotes haggle blithely against the furiously regular depo', u''), (u'85', u'Customer#000000085', u'siRerlDwiolhYR 8FgksoezycLj', u'5', u'15-745-585-8219', u'3386.64', u'FURNITURE', u'ronic ideas use above the slowly pendin', u''), (u'86', u'Customer#000000086', u'US6EGGHXbTTXPL9SBsxQJsuvy', u'0', u'10-677-951-2353', u'3306.32', u'HOUSEHOLD', u'quests. pending dugouts are carefully aroun', u''), (u'87', u'Customer#000000087', u'hgGhHVSWQl 6jZ6Ev', u'23', u'33-869-884-7053', u'6327.54', u'FURNITURE', u'hely ironic requests integrate according to the ironic accounts. slyly regular pla', u''), (u'88', u'Customer#000000088', u'wtkjBN9eyrFuENSMmMFlJ3e7jE5KXcg', u'16', u'26-516-273-2566', u'8031.44', u'AUTOMOBILE', u's are quickly above the quickly ironic instructions; even requests about the carefully final deposi', u''), (u'89', u'Customer#000000089', u'dtR, y9JQWUO6FoJExyp8whOU', u'14', u'24-394-451-5404', u'1530.76', u'FURNITURE', u'counts are slyly beyond the slyly final accounts. quickly final ideas wake. r', u''), (u'90', u'Customer#000000090', u'QxCzH7VxxYUWwfL7', u'16', u'26-603-491-1238', u'7354.23', u'BUILDING', u'sly across the furiously even ', u''), (u'91', u'Customer#000000091', u'S8OMYFrpHwoNHaGBeuS6E 6zhHGZiprw1b7 q', u'8', u'18-239-400-3677', u'4643.14', u'AUTOMOBILE', u'onic accounts. fluffily silent pinto beans boost blithely according to the fluffily exp', u''), (u'92', u'Customer#000000092', u'obP PULk2LH LqNF,K9hcbNqnLAkJVsl5xqSrY,', u'2', u'12-446-416-8471', u'1182.91', u'MACHINERY', u'. pinto beans hang slyly final deposits. ac', u''), (u'93', u'Customer#000000093', u'EHXBr2QGdh', u'7', u'17-359-388-5266', u'2182.52', u'MACHINERY', u'press deposits. carefully regular platelets r', u''), (u'94', u'Customer#000000094', u'IfVNIN9KtkScJ9dUjK3Pg5gY1aFeaXewwf', u'9', u'19-953-499-8833', u'5500.11', u'HOUSEHOLD', u'latelets across the bold, final requests sleep according to the fluffily bold accounts. unusual deposits amon', u''), (u'95', u'Customer#000000095', u'EU0xvmWvOmUUn5J,2z85DQyG7QCJ9Xq7', u'15', u'25-923-255-2929', u'5327.38', u'MACHINERY', u'ithely. ruthlessly final requests wake slyly alongside of the furiously silent pinto beans. even the', u''), (u'96', u'Customer#000000096', u'vWLOrmXhRR', u'8', u'18-422-845-1202', u'6323.92', u'AUTOMOBILE', u'press requests believe furiously. carefully final instructions snooze carefully. ', u''), (u'97', u'Customer#000000097', u'OApyejbhJG,0Iw3j rd1M', u'17', u'27-588-919-5638', u'2164.48', u'AUTOMOBILE', u'haggle slyly. bold, special ideas are blithely above the thinly bold theo', u''), (u'98', u'Customer#000000098', u'7yiheXNSpuEAwbswDW', u'12', u'22-885-845-6889', u'-551.37', u'BUILDING', u'ages. furiously pending accounts are quickly carefully final foxes: busily pe', u''), (u'99', u'Customer#000000099', u'szsrOiPtCHVS97Lt', u'15', u'25-515-237-9232', u'4088.65', u'HOUSEHOLD', u'cajole slyly about the regular theodolites! furiously bold requests nag along the pending, regular packages. somas', u''), (u'100', u'Customer#000000100', u'fptUABXcmkC5Wx', u'20', u'30-749-445-4907', u'9889.89', u'FURNITURE', u'was furiously fluffily quiet deposits. silent, pending requests boost against ', u'')] 

我申請GROUPBY(「customerrdd」)在attribute key-6進一步customerrdd, 對砂石料的操作說addition on attribute key-3本人已應用了一系列reducebykey操作flatmap和mapling值,這裏是它的代碼:

def func(x): 
    return x 


def stringconverfunc(z): 
    return str(z) 


def floatconverfunc(l): 
    return float(l) 

def aggonvalfunc(y): 
    return y[3] 


grouprdd=customerrdd.groupBy(lambda w:(w[6])) 


result=grouprdd.flatMapValues(lambda q: func(q)).mapValues(lambda p: aggonvalfunc(p)) \ 
     .mapValues(lambda line: stringconverfunc(line)).mapValues(lambda line: line.strip()) \ 
     .mapValues(lambda line: floatconverfunc(line)).reduceByKey(lambda x, y: x + y).collect() 
print result 

OUTPUT:

[(u'BUILDING', 20), (u'AUTOMOBILE', 21), (u'HOUSEHOLD', 21), (u'MACHINERY', 16), (u'FURNITURE', 22)] 

但是,上述方法在混洗方面相當昂貴,並且不適用於較大的數據集。因此,我想用rdd.combinebykey來實現上述相同的概念,以便更快地計算並可用於較大的數據集。 我試圖通過引用combinebykey來實現它,但卻很困惑如何提供需要執行聚合的密鑰和值。誰能幫忙?我想提出建議

回答

0

好的,對於初學者來說很難知道所有這些,所以我會盡量給你一些提示。

您可以分配鍵而不分組,這可以通過keyBy來完成,並且不涉及洗牌。最後,一個鍵值rdd僅僅是一個由大小爲2的元組組成的rdd,其中第一個鍵是鍵,第二個鍵是值。
reduceByKeycombineByKey可以獲得的任何性能提升將會變得無用,如果您事先進行了分組,則可以避免這種情況。

此外,您可以調用float使用具有前導和尾隨空格的字符串,它會自動剝離字符串。您也不需要創建形式爲lambda x: f(x)的lambdas,只需使用f而不需要任何大括號,它將具有相同的效果。出於同樣的原因,您不需要將strfloat換成其他功能。
operator模塊提供了添加和檢索值的功能,因此您不需要定義這些值。請查看python文檔以獲取更多信息。

我的解決辦法是:

from operator import itemgetter, add 

# `itemgetter(6)` is equivalent to `lambda x: x[6]`. Therefore we'll use element at 
# index 6 to key the rdd's entries. 
# This operation is equivalent to `customerrdd.map(lambda x: (x[6], x))` 
rdd = customerrdd.keyBy(itemgetter(6)) 

# Now extract element at index 3 from the values so we no longer have a tuple 
rdd = rdd.mapValues(itemgetter(3)) 

# Convert those elements to floats 
rdd = rdd.mapValues(float) 

# We could've done the previous steps in one by doing 
# rdd = customerrdd.map(lambda x: (x[6], float(x[3])) 

# Sum them up and collect the result 
result = rdd.reduceByKey(add).collect() 

沒有評論

from operator import itemgetter, add 

result = customerrdd.keyBy(itemgetter(6))\ 
    .mapValues(itemgetter(3))\ 
    .mapValues(float)\ 
    .reduceByKey(add).collect() 

這誠然返回

[(u'BUILDING', 204.0), 
(u'AUTOMOBILE', 280.0), 
(u'MACHINERY', 135.0), 
(u'HOUSEHOLD', 255.0), 
(u'FURNITURE', 224.0)] 

不同的結果比你,但我跑了你的代碼,並得到了相同的。所以我想你的結果有不同的rdd。

+0

您好@swenzel,首先感謝您的回覆,但我想實施使用'combinebykey'操作或類似的,以減少洗牌的開銷。然而'reducebykey'在計算時涉及高混亂的開銷。避免這種操作的原因是groupbykey()reducebykey()是在執行時性能受到影響,並且由於此操作創建的內存複雜化導致較大數據集的某些時間作業失敗。因此尋找像'combinebykey'這樣的操作的解決方案' –

+0

@ShafaatHussain我很確定'reduceByKey'不會比'combineByKey'慢,這對於簡單求和來說是一個完整的矯枉過正。你的瓶頸是'groupBy'操作。你試過我提供的代碼嗎? – swenzel

+0

再次感謝,但我只是不需要求和,將它擴展爲不同的聚合操作。是的,你會嘗試解決方案,只要你:) –