2016-03-04 37 views
1

我有一個列表需要根據列表中的字符串進行合併以適合結構。在這種情況下,這將是'日期'和'ID'試圖適應'領域'結構。如何基於列表中的公共字符串列出列表,Python

領域:['date', 'id', 'impressions', 'clicks']

前:

[('2015-11-01', 'id123', 'impressions', '8'), ('2015-11-01', 'id123', 
'clicks', '4'), ('2015-11-01', 'id456', 'impressions', '14'), 
('2015-11-01', 'id456', 'clicks', '9')] 

後:

[('2015-11-01', 'id123', '8', '4'), ('2015-11-01', 'id456', '14', '9')] 
+0

我不能明白的結果,可你把在其他的話呢? – RafaelC

+0

結果列表需要遵循「字段」結構。符合'日期'和'身份證'。 「展示次數」和「點擊次數」這幾個字詞是按順序排列的,因此可以認爲「8」是「展示次數」,「4」是點擊次數。 – PieCharmed

回答

1
>>> L = [('2015-11-01', 'id123', 'impressions', '8'), ('2015-11-01', 'id123', 
... 'clicks', '4'), ('2015-11-01', 'id456', 'impressions', '14'), 
... ('2015-11-01', 'id456', 'clicks', '9')] 
>>> from collections import defaultdict 
>>> D = defaultdict(list) 
>>> for a, b, c, d in L: 
...  D[a, b].append(d) 
... 
>>> [k + tuple(D[k]) for k in D] 
[('2015-11-01', 'id456', '14', '9'), ('2015-11-01', 'id123', '8', '4')] 

在這種情況下是展示和點擊次數不是在一個一致的順序

>>> L = [('2015-11-01', 'id123', 'impressions', '8'), ('2015-11-01', 'id123', 'clicks', '4'), ('2015-11-01', 'id456', 'clicks', '9'), ('2015-11-01', 'id456', 'impressions', '14')] 
>>> from collections import defaultdict 
>>> D = defaultdict(lambda: [None, None]) 
>>> for a, b, c, d in L: 
...  D[a, b][c == 'clicks'] = d 
... 
>>> [k + tuple(D[k]) for k in D] 
[('2015-11-01', 'id456', '14', '9'), ('2015-11-01', 'id123', '8', '4')] 
+0

這適用於「L」將始終具有「展示次數」和「點擊次數」後面的情況。如果「L」看起來像:[('2015-11-01','id123','印象','8'),('2015-11-01','id123' ,'點擊','4'),('2015-11-01','id456','點擊','9'),('2015-11-01','id456','印象',' 14')]但仍然遵循上面提到的'fields'命令? – PieCharmed

+0

@PieCharmed,我已經添加了一種方法來解決我的問題 –

0

itertools.groupby可以很好地工作在這裏,特別是如果真實數據樣本數據相匹配(已經排序等等日期/ ID對全部相鄰):

import itertools 
from operator import itemgetter 

outlist = [] 
for (date, ID), grp in itertools.groupby(inlist, key=itemgetter(0, 1)): 
    grp = list(grp) # Iterating twice, so convert to sequence 
    impressioncnt = sum(int(cnt) for _, _, typ, cnt in grp if typ == 'impressions') 
    clickcnt = sum(int(cnt) for _, _, typ, cnt in grp if typ == 'clicks') 
    outlist.append((date, ID, str(impressioncnt), str(clickcnt))) 

如果數據尚未按dateID排序,則需要先對inlist進行排序,inlist.sort(key=itemgetter(0, 1))。這可能是昂貴的,如果list是巨大的,在這種情況下,你可能會考慮使用collections.defaultdict代替:

import collections 

dateID_cnts = collections.defaultdict({'impressions': 0, 'clicks': 0}.copy) 
for date, ID, typ, cnt in inlist: 
    dateID_cnts[date, ID][typ] += int(cnt) 

# Convert from defaultdict to desired list of tuples 
outlist = [(date, ID, str(v['impressions']), str(v['counts'])) for (date, ID), v in dateID_cnts.items()] 
+0

看起來好像每個日期/ ID組合可能只有一次展示和點擊。如果是這種情況,你可以簡化這個很多 –

+0

@JohnLaRooy:是的,錯過了每個只有一個,他們是字符串,而不是整數。這是我的「一般情況」解決方案? :-) – ShadowRanger

+1

'outlist = [k +(next(grp)[ - 1],next(grp)[ - 1])for k,grp in itertools.groupby(L,key = itemgetter(0,1))] ' –

0

另一種方式:

data=[('2015-11-01', 'id123', 'impressions', '8'), 
     ('2015-11-01', 'id123','clicks', '4'), 
     ('2015-11-01', 'id456', 'impressions', '14'), 
     ('2015-11-01', 'id456', 'clicks', '9')] 

ddict={} 
for t in data: 
    key=(t[0], t[1]) 
    ddict.setdefault(key, []).append(t[2:]) 

LoT=[]  
for d, id in ddict: 
    impressions, clicks=max(ddict[(d, id)])[1], min(ddict[(d, id)])[1] 
    LoT.append(tuple([d, id, impressions, clicks])) 

>>> LoT 
[('2015-11-01', 'id123', '8', '4'), ('2015-11-01', 'id456', '14', '9')] 

如果您可以假設impressionsclicks已經在順序,可以消除maxmin並將其替換該行:

impressions, clicks=ddict[(d, id)][0][1], ddict[(d, id)][1][1] 
相關問題