2015-10-17 28 views
-5

請幫助我。 我的文件看起來像這樣:找到多個重複項,重複計數重複次數和唯一索引以及Python中的第一個重複文本

This is a cat 
we are working at BusinessBrio 
Gitu is my beloved cat 
Jery is also a cat 
Boni is a nice dog 
Gitu is my beloved cat 
we are working at BusinessBrio 
This is a cat 
we are working at BusinessBrio 
Gitu is my beloved cat 
Jery is also a cat 
Boni is a nice dog 
Gitu is my beloved cat 
we are working at BusinessBrio 

我需要一個像這樣的輸出:

[[1,'we are working at BusinessBrio',4],[2,'Gitu is my beloved cat',4],[0,'This is a cat',2],[3,'Jery is also a cat',2],[4,'Boni is a nice dog',2]] 

越飛越輸出必須基於重複計數降序排列

+0

你關心文件的大小? –

+0

您能向我們展示您迄今爲止所做的工作嗎? – 2015-10-17 10:45:04

回答

0
It is not clear how to separate sentences since there is no punctuation. But suppose we know how to. Then just use Counter from collection. 

data = ''' 
This is a cat 
we are working at BusinessBrio 
Gitu is my beloved cat 
Jery is also a cat 
Boni is a nice dog 
Gitu is my beloved cat 
we are working at BusinessBrio 
This is a cat 
we are working at BusinessBrio 
Gitu is my beloved cat 
Jery is also a cat 
Boni is a nice dog 
Gitu is my beloved cat 
we are working at BusinessBrio 
''' 
li = data.split('\n') 

from collections import Counter 

pp(Counter(li)) 

Counter({'we are working at BusinessBrio': 4, 
     'Gitu is my beloved cat': 4, 
     'Boni is a nice dog': 2, 
     'This is a cat': 2, 
     'Jery is also a cat': 2, 
     '': 1, 
     ' ': 1}) 
0

使用Counter進行排序和sorted功能。

from collections import Counter 

with open("hel.txt","r") as f: 
    b=f.read().splitlines() 

counter=Counter(b) 

output=[] 

for key, value in counter.iteritems(): 
    lst=[] 
    lst.append(b.index(key)) 
    lst.append(key) 
    lst.append(value) 
    output.append(lst) 

out=sorted(output,key=lambda x:x[2],reverse=True) 
print out 

輸出:

[[1, 'we are working at BusinessBrio', 4], [2, 'Gitu is my beloved cat', 4], [0, 'This is a cat', 2], [4, 'Boni is a nice dog', 2], [3, 'Jery is also a cat', 2]] 
+0

它工作正常。 Thnx爲您提供幫助 – Chandan