他們給我確切地解決同樣的問題,所以衝浪有關的問題很多。這就是爲什麼想在這裏分享我的解決方案。雖然我的解決方案需要一段時間才能執行,但它的內部處理時間比我想象的要好。我可能錯了。反正這裏有雲解決方案:
def CountOccurencesInText(word,text):
"""Number of occurences of word (case insensitive) in text"""
acceptedChar = ('a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm',
'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y', 'z', '-', ' ')
for x in ",!?;_\n«»():\".":
if x == "\n" or x == "«" or x == "»" or x == "(" or x == ")" or x == "\"" or x == ":" or x == ".":
text = text.replace(x," ")
else:
text = text.replace(x,"")
"""this specifically handles the imput I am attaching my c.v. to this e-mail."""
if len(word) == 32:
for x in ".":
word = word.replace(x," ")
punc_Removed_Text = ""
text = text.lower()
for i in range(len(text)):
if text[i] in acceptedChar:
punc_Removed_Text = punc_Removed_Text + text[i]
""""this specifically handles the imput: Do I have to take that as a 'yes'"""
elif text[i] == '\'' and text[i-1] == 's':
punc_Removed_Text = punc_Removed_Text + text[i]
elif text[i] == '\'' and text[i-1] in acceptedChar and text[i+1] in acceptedChar:
punc_Removed_Text = punc_Removed_Text + text[i]
elif text[i] == '\'' and text[i-1] == " " and text[i+1] in acceptedChar:
punc_Removed_Text = punc_Removed_Text + text[i]
elif text[i] == '\'' and text[i-1] in acceptedChar and text[i+1] == " " :
punc_Removed_Text = punc_Removed_Text + text[i]
frequency = 0
splitedText = punc_Removed_Text.split(word.lower())
for y in range(0,len(splitedText)-1,1):
element = splitedText[y]
if len(element) == 0:
if(splitedText[y+1][0] == " "):
frequency += 1
elif len(element) == 0:
if(len(splitedText[y+1][0])==0):
frequency += 1
elif len(splitedText[y+1]) == 0:
if(element[len(element)-1] == " "):
frequency += 1
elif (element[len(element)-1] == " " and splitedText[y+1][0] == " "):
frequency += 1
return frequency
這裏是簡介:
128006 function calls in 7.831 seconds
Ordered by: standard name
ncalls tottime percall cumtime percall filename:lineno(function)
1 0.000 0.000 7.831 7.831 :0(exec)
32800 0.062 0.000 0.062 0.000 :0(len)
11200 0.047 0.000 0.047 0.000 :0(lower)
1 0.000 0.000 0.000 0.000 :0(print)
72800 0.359 0.000 0.359 0.000 :0(replace)
1 0.000 0.000 0.000 0.000 :0(setprofile)
5600 0.078 0.000 0.078 0.000 :0(split)
1 0.000 0.000 7.831 7.831 <string>:1(<module>)
1 0.000 0.000 7.831 7.831 ideone-gg.py:225(doit)
5600 7.285 0.001 7.831 0.001 ideone-gg.py:3(CountOccurencesInText)
1 0.000 0.000 7.831 7.831 profile:0(doit())
0 0.000 0.000 profile:0(profiler)
正則表達式? http://docs.python.org/howto/regex.html – Li0liQ 2012-01-05 12:51:33
你有多少個查詢?如果你有很多它們,我建議你將小寫的文本拆分成單詞(O(n)),對它們進行排序並在結果列表中進行搜索(二進制搜索+迭代相鄰的記錄) – 2012-01-05 12:56:24
爲什麼在天堂中你必須綁定到Python 2.3? – jsbueno 2012-01-05 16:00:12