在一個列表中,在每個項目(另一個或同一個)列表進行比較的每個項目的過程在數學上被稱爲Cartesian product。 Python有一個內置函數來做到這一點:itertools.product這相當於嵌套的for循環:
假設A和B是列表:
for x in A:
for y in B:
print (x,y)
可以寫成一個generator expression爲:
for pair in ((x,y) for x in A for y in B):
print pair
,或者更簡潔:
from itertools import product
for pair in product(A, B):
print pair
在你的情況你將列表中的所有項目與自身進行比較,因此您可以編寫product(texts, texts)
,但產品在此情況下具有可選的關鍵字參數repeat
:product(A, repeat=4)
的含義與product(A, A, A, A)
相同。
您現在可以重寫代碼是這樣的:
from itertools import product
caesar = """BOOK I
I.--All Gaul is divided into three parts, one of which the Belgae
inhabit, the Aquitani another, those who in their own language are
called Celts, in ours Gauls, the third. All these differ from each other
in language, customs and laws."""
hamlet = """Who's there?"
"Nay, answer me. Stand and unfold yourself."
"Long live the King!"
"Barnardo!"
"He." (I.i.1-5)"""
macbeth = """ACT I SCENE I A desert place. Thunder and lightning.
[Thunder and lightning. Enter three Witches]
First Witch When shall we three meet again
In thunder, lightning, or in rain?
Second Witch When the hurlyburly's done,
When the battle's lost and won."""
texts = [caesar, hamlet, macbeth]
def similarity(x, y):
"""similarity based on length of the text,
substitute with similarity function from Natural Language Toolkit"""
return float(len(x))/len(y)
for pair in product(texts, repeat=2):
print "{}".format(similarity(*pair))
非常感謝。對此,我真的非常感激!我使用round(),因爲我的similarity()函數輸出一個浮點數。 – 2012-02-28 14:48:06
@Adam_G:我知道你爲什麼使用'round()',但如上所述,'round()'並不意味着用於輸出格式。有關輸出格式的更多信息,請參閱Python教程中的[Fancier輸出格式化]一節(http://docs.python.org/tutorial/inputoutput.html#fancier-output-formatting),並參見[浮點運算:問題和侷限性](http://docs.python.org/tutorial/floatingpoint.html)爲什麼使用'round()'來達到這個目的是個壞主意。 – 2012-02-28 14:55:46