我需要在這一點上做的兩件事情,但我需要你的幫助:如何從我的電子郵件打印有組織的郵件?
- 最好的做法來清理數據 - 編程刪除多餘的標籤&的「>>>>>>>」,再加上其他非有意義的溝通flotsam和jetsum
- 一旦它被清理 - 我如何收拾好它在django & sqlite中運行良好。
- 我可以根據日期,人物,主題,單詞將它變成csv,然後將它們輸入到我的數據庫中的數據類中嗎?
嗯,在我進入數據庫,我希望能夠乾淨利落排序排序和顯示數據 - 我很少經歷將東西放入數據庫中,我做的最接近的是從XML,csv和JSON開始工作。
我需要通過排名獲得ngrams,例如某人在一系列電子郵件中出現某個詞的次數。我試圖更加接近地瞭解人們如何與我談論科目等。一個非常基本的版本Jon Kleinberg's work analyzing his own emails.
要溫柔,粗糙但請幫助:)!
>我的輸出目前看起來像這樣::1, '每個':1, '我':1, 'IN \ r \ n \ r \ n2012/1月31日!':1,'計算器。\ r \ n >>>>>> \ r \ n >>>>>>':1,'people':1,'= 97MB \ r \ n> \ r \ n>':1,''我們':2,'寫道:\ r \ n >>>>>> \ r \ n >>>>>>':1,'= \ r \ nwrote:\ r \ n >>>>> \ r \ n >>>>>>':1,'2012/1/31':2,'are':1,'31,':5,'= 97MB \ r \ n >>>> \ r \ n >>>>':1, '1:45':1 '是\ r \ n >>>>>':1, '已發送':
import getpass, imaplib, email
# NGramCounter builds a dictionary relating ngrams (as tuples) to the number
# of times that ngram occurs in a text (as integers)
class NGramCounter(object):
# parameter n is the 'order' (length) of the desired n-gram
def __init__(self, text):
self.text = text
self.ngrams = dict()
# feed method calls tokenize to break the given string up into units
def tokenize(self):
return self.text.split(" ")
# feed method takes text, tokenizes it, and visits every group of n tokens
# in turn, adding the group to self.ngrams or incrementing count in same
def parse(self):
tokens = self.tokenize()
#Moves through every individual word in the text, increments counter if already found
#else sets count to 1
for word in tokens:
if word in self.ngrams:
self.ngrams[word] += 1
else:
self.ngrams[word] = 1
def get_ngrams(self):
return self.ngrams
#loading profile for login
M = imaplib.IMAP4_SSL('imap.gmail.com')
M.login("EMAIL", "PASS")
M.select()
new = open('liamartinez.txt', 'w')
typ, data = M.search(None, 'FROM', 'SEARCHGOES_HERE') #Gets ALL messages
def get_first_text_part(msg): #where should this be nested?
maintype = msg.get_content_maintype()
if maintype == 'multipart':
for part in msg.get_payload():
if part.get_content_maintype() == 'text':
return part.get_payload()
elif maintype == 'text':
return msg.get_payload()
for num in data[0].split(): #Loops through all messages
typ, data = M.fetch(num, '(RFC822)') #Pulls Message
msg = email.message_from_string(data[0][2]) #Puts message into easy to use python objects
_from = msg['from'] #pull from
_to = msg['to'] #pull to
_subject = msg['subject'] #pull subject
_body = get_first_text_part(msg) #pull body
if _body:
ngrams = NGramCounter(_body)
ngrams.parse()
_feed = ngrams.get_ngrams()
# print "\n".join("\t".join(str(_feed) for col in row) for row in tab)
print _feed
# print 'Content-Type:',msg.get_content_type()
# print _from
# print _to
# print _subject
# print _body
#
new.write(_from)
print '---------------------------------'
M.close()
M.logout()
不,我不是,感謝要求。哦,等等...... – 2012-04-12 07:10:15
Ignacio的意思是說你的標題應該描述你的實際問題(而不是在帖子中埋藏那麼深)而不是問我們是否試圖寫一個程序。 – agf 2012-04-12 07:11:43
謝謝!編輯得更清楚。任何建議? – 2012-04-12 19:49:54