我有一個這樣的單詞的語料庫。有超過3000字。但也有2個文件:多個文件中的單詞匹配
File #1:
#fabulous 7.526 2301 2
#excellent 7.247 2612 3
#superb 7.199 1660 2
#perfection 7.099 3004 4
#terrific 6.922 629 1
#magnificent 6.672 490 1
File #2:
) #perfect 6.021 511 2
? #great 5.995 249 1
! #magnificent 5.979 245 1
) #ideal 5.925 232 1
day #great 5.867 219 1
bed #perfect 5.858 217 1
) #heavenly 5.73 191 1
night #perfect 5.671 180 1
night #great 5.654 177 1
. #partytime 5.427 141 1
我有很多句子是這樣,3000個多行象下面這樣:
superb, All I know is the road for that Lomardi start at TONIGHT!!!! We will set a record for a pre-season MNF I can guarantee it, perfection.
All Blue and White fam, we r meeting at Golden Corral for dinner to night at 6pm....great
我必須要經過的每一行,然後執行以下任務:
1 )發現的,如果把這些語料中的句子
2之間是否匹配)發現的,如果把這些語料匹配領先和句子的結尾
我能夠做的第2部分),而不是第1部分) 。我能做到但找到一種有效的方法。 我有以下代碼:
for line in sys.stdin:
(id,num,senti,words) = re.split("\t+",line.strip())
sentence = re.split("\s+", words.strip().lower())
for line1 in f1: #f1 is the file containing all corpus of words like File #1
(term2,sentimentScore,numPos,numNeg) = re.split("\t", line1.strip())
wordanalysis["trail"] = found if re.match(sentence[(len(sentence)-1)],term2.lower()) else not(found)
wordanalysis["lead"] = found if re.match(sentence[0],term2.lower()) else not(found)
for line in sys.stdin:
(id,num,senti,words) = re.split("\t+",line.strip())
sentence = re.split("\s+", words.strip().lower())
for line1 in f1: #f1 is the file containing all corpus of words like File #1
(term2,sentimentScore,numPos,numNeg) = re.split("\t", line1.strip())
wordanalysis["trail"] = found if re.match(sentence[(len(sentence)-1)],term2.lower()) else not(found)
wordanalysis["lead"] = found if re.match(sentence[0],term2.lower()) else not(found)
for line1 in f2: #f2 is the file containing all corpus of words like File #2
(term2,sentimentScore,numPos,numNeg) = re.split("\t", line1.strip())
wordanalysis["trail_2"] = found if re.match(sentence[(len(sentence)-1)],term.lower()) else not(found)
wordanalysis["lead_2"] = found if re.match(sentence[0],term.lower()) else not(found)
我是對的嗎?有沒有更好的方法來做到這一點。
怎麼樣使用數據strcuctrue *哈希* in * Redis *?首先,將兩個文件中的數據讀入存儲在* Hashes *中的Redis。然後當從一個句子中讀出一個單詞時,在Redis中做一個可能非常快的哈希搜索。這可能是幫助[redis中的哈希命令](http://redis.io/commands#hash) – flyer
@flyer就像在Java中的Hashtable? – fscore
對不起,我對Java很少了解。這是一個簡單的解釋:[小redis書](https://github.com/karlseguin/the-little-redis-book/blob/master/en/redis.md#hashes) – flyer