我有一個元組的句子,其表明的那裏可以是一個國家或一個數字的位置:條件列表修真元組時長超過一個
sample = In the first 11 months of 2004 Hong Kong 's international airport at Chek Lap Kok handled daily an average of 592 flights , 92,630 passengers , and more than 7,734 tons of cargo.
然後:
tokenIDs2number = {(22,): 592.00, (25,): 92630.00,(34,): 7734.00}
tokenIDs2location = {(8,9): Hong Kong}
我需要爲這些元組的不同組合創建各種句子組合,我稱之爲槽句子:
In the first 11 months of 2004 LOCATION_SLOT 's international airport at Chek Lap Kok handled daily an average of NUMBER_SLOT flights , 92,630 passengers , and more than 7,734 tons of cargo.
In the first 11 months of 2004 LOCATION_SLOT 's international airport at Chek Lap Kok handled daily an average of 592 flights , NUMBER_SLOT passengers , and more than 7,734 tons of cargo.
In the first 11 months of 2004 LOCATION_SLOT 's international airport at Chek Lap Kok handled daily an average of 592 flights , 92,630 passengers , and more than NUMBER_SLOT tons of cargo.
然而,我當前的代碼基本上取所述元組中的元素的組合,因此,我有兩個這樣的句子:
In the first 11 months of 2004 LOCATION_SLOT Kong 's international airport at Chek Lap Kok handled daily an average of NUMBER_SLOT flights , 92,630 passengers , and more than 7,734 tons of cargo.
In the first 11 months of 2004 Hong LOCATION_SLOT 's international airport at Chek Lap Kok handled daily an average of NUMBER_SLOT flights , 92,630 passengers , and more than 7,734 tons of cargo.
作爲一個例子。
我該如何解決這個問題,以便當我有一個元組密鑰len>1
時,根據我的願望,將該密鑰中的所有插槽填充爲一個LOCATION或NUMBER插槽?
當前代碼:
for locationTokenIDs, location in tokenIDs2location.items():
for numberTokenIDs, number in tokenIDs2number.items():
sentenceDict = {}
sentenceDict["sentence"] = sample
sentenceDict["location-value-pair"] = {location:number}
for locationTokenID in locationTokenIDs:
for numberTokenID in numberTokenIDs:
finalTokens = cleanSample.split()
finalTokens[numberTokenID] = "NUMBER_SLOT"
finalTokens[locationTokenID] = "LOCATION_SLOT"
slotSentence = (" ").join(finalTokens)
sentenceDict["parsedSentence"] = slotSentence
注意,我要創建一個字典,還跟蹤位置值對與原句爲每個插槽句子組合。關鍵部分是生成正確的slotSentence
。
請注意,這只是一個示例,數字甚至可能爲24000000
,其中句子中的值爲24 million
,等於萬億,百萬,十億和千。
如果這是不可能的,另外一個選項是填補所有插槽的組合:
In the first 11 months of 2004 LOCATION_SLOT LOCATION_SLOT 's international airport at Chek Lap Kok handled daily an average of NUMBER_SLOT flights , 92,630 passengers , and more than 7,734 tons of cargo.
然後也許適應了一句刪除連續插槽,但我更傾向於將到盡一切努力。
這是一個很好的答案,邏輯上合理,您能解釋爲什麼位置插槽被空白分隔嗎?另外我怎樣才能使這個通用的(有時插槽跨越不止兩個空間,例如像「剛果民主共和國」的國家,也可能有多個插槽的數字不僅僅是位置。正在使用'len(locationTokenIDs )''但是我沒有掩蓋必要的國家 –
這適用於具有任意數量空格的國家,因爲locationTokenIDs中的值代表切片端點並在代碼中被視爲這樣。我更新了我的答案,代碼適用於具有任意數量空格的位置和數字 –
我剛調整了你的代碼,但不幸的是,這不允許我在單獨的'sentenceDicts'中添加多個槽句子的例子。我還必須包含一個if語句,比如'if len(numberTokenIDs)> 1: finalTokens [numberTokenIDs [0] :(numberTokenIDs [1] +1)] =「NUMBER_SLOT」 else: finalTokens [numberTokenID] =「NUMBER_SLOT」 –