條件列表修真元組時長超過一個

-1

我有一個元組的句子，其表明的那裏可以是一個國家或一個數字的位置：條件列表修真元組時長超過一個

sample = In the first 11 months of 2004 Hong Kong 's international airport at Chek Lap Kok handled daily an average of 592 flights , 92,630 passengers , and more than 7,734 tons of cargo.

然後：

tokenIDs2number = {(22,): 592.00, (25,): 92630.00,(34,): 7734.00} 
tokenIDs2location = {(8,9): Hong Kong}

我需要爲這些元組的不同組合創建各種句子組合，我稱之爲槽句子：

In the first 11 months of 2004 LOCATION_SLOT 's international airport at Chek Lap Kok handled daily an average of NUMBER_SLOT flights , 92,630 passengers , and more than 7,734 tons of cargo. 

In the first 11 months of 2004 LOCATION_SLOT 's international airport at Chek Lap Kok handled daily an average of 592 flights , NUMBER_SLOT passengers , and more than 7,734 tons of cargo. 

In the first 11 months of 2004 LOCATION_SLOT 's international airport at Chek Lap Kok handled daily an average of 592 flights , 92,630 passengers , and more than NUMBER_SLOT tons of cargo.

然而，我當前的代碼基本上取所述元組中的元素的組合，因此，我有兩個這樣的句子：

In the first 11 months of 2004 LOCATION_SLOT Kong 's international airport at Chek Lap Kok handled daily an average of NUMBER_SLOT flights , 92,630 passengers , and more than 7,734 tons of cargo. 

In the first 11 months of 2004 Hong LOCATION_SLOT 's international airport at Chek Lap Kok handled daily an average of NUMBER_SLOT flights , 92,630 passengers , and more than 7,734 tons of cargo.

作爲一個例子。

我該如何解決這個問題，以便當我有一個元組密鑰len>1時，根據我的願望，將該密鑰中的所有插槽填充爲一個LOCATION或NUMBER插槽？

當前代碼：

for locationTokenIDs, location in tokenIDs2location.items(): 
        for numberTokenIDs, number in tokenIDs2number.items():  
         sentenceDict = {}  
         sentenceDict["sentence"] = sample  
         sentenceDict["location-value-pair"] = {location:number} 
         for locationTokenID in locationTokenIDs: 
          for numberTokenID in numberTokenIDs:         
           finalTokens = cleanSample.split() 
           finalTokens[numberTokenID] = "NUMBER_SLOT" 
           finalTokens[locationTokenID] = "LOCATION_SLOT" 
           slotSentence = (" ").join(finalTokens) 
           sentenceDict["parsedSentence"] = slotSentence

注意，我要創建一個字典，還跟蹤位置值對與原句爲每個插槽句子組合。關鍵部分是生成正確的slotSentence。

請注意，這只是一個示例，數字甚至可能爲24000000，其中句子中的值爲24 million，等於萬億，百萬，十億和千。

如果這是不可能的，另外一個選項是填補所有插槽的組合：

In the first 11 months of 2004 LOCATION_SLOT LOCATION_SLOT 's international airport at Chek Lap Kok handled daily an average of NUMBER_SLOT flights , 92,630 passengers , and more than 7,734 tons of cargo.

然後也許適應了一句刪除連續插槽，但我更傾向於將到盡一切努力。

來源

2016-08-14 Dhruv Ghulati

我已經解決了我的使用情況，但使用一種迂迴的方式（的sentenceList）。

我第一允許其包含多個LOCATION_SLOT或NUMBER_SLOT槽句子 - 如果在組合一個元組包含2首或更多個狹槽，我填寫所有：

sentences2location2values = [] 

for locationTokenIDs, location in tokenIDs2location.items(): 
        for numberTokenIDs, number in tokenIDs2number.items():  
         sentenceDict = {}  
         sentenceDict["sentence"] = sample  
         sentenceDict["location-value-pair"] = {location:number} 
         for locationTokenID in locationTokenIDs: 
          sampleTokens[locationTokenID] = "LOCATION_SLOT" 

         for numberTokenID in numberTokenIDs: 
          sampleTokens[numberTokenID] = "NUMBER_SLOT" 

        slotSentence = (" ").join(sampleTokens) 
        sentenceDict["parsedSentence"] = slotSentence 
        sentences2location2values.append(sentenceDict)

然後，我改變所解析的句子，以除去連續的位置和數字插槽：

for i,sentence in enumerate(sentences2location2values): 
     sampleTokens = sentence['parsedSentence'].split() 
     newTokens = [] 
     for i,token in enumerate(sampleTokens): 
      if i>0 and ((token == "LOCATION_SLOT" and sampleTokens[i-1]=="LOCATION_SLOT") or (token == "NUMBER_SLOT" and sampleTokens[i-1]=="NUMBER_SLOT")): 
       continue 
      else: 
       newTokens.append(token) 

     sentence['parsedSentence']=(' ').join(newTokens)

來源

2016-08-15 07:06:16

當locationTokenID真正代表應該被視爲插槽的令牌片段的端點時，代碼將每個locationTokenID視爲一個插槽。因此，我們需要刪除for locationTokenID in locationTokenIDs:循環（循環遍歷每個locationTokenID，就好像它是一個插槽），並用一個插槽替換locationTokenID對定義的相應切片。

下面的代碼解決了OP中解決的問題，但其他問題仍然存在（例如只保留最後生成的slotSentence;我會讓你解決這個問題，因爲我不知道你要存儲哪些數據結構在插槽的句子）：

sample = "In the first 11 months of 2004 Hong Kong 's international airport at Chek Lap Kok handled daily an average of 592 flights , 92,630 passengers , and more than 7,734 tons of cargo." 

tokenIDs2number = {(21,): 592, (24,): 92630,(30,): 7734} 
tokenIDs2location = {(7,8): 'Hong Kong'} 

for locationTokenIDs, location in tokenIDs2location.items(): 
    for numberTokenIDs, number in tokenIDs2number.items():  
     sentenceDict = {}  
     sentenceDict["sentence"] = sample  
     sentenceDict["location-value-pair"] = {location:number} 
     for numberTokenID in numberTokenIDs:         
      finalTokens = sample.split() 
      finalTokens[numberTokenID] = "NUMBER_SLOT" 
      finalTokens[locationTokenIDs[0]:(locationTokenIDs[1]+1)] = "LOCATION_SLOT" 
      slotSentence = (" ").join(finalTokens) 
      sentenceDict["parsedSentence"] = slotSentence 
      print(slotSentence)

輸出：

在頭11個月2004 LOCATION的_ SLOT 's 赤Kok角國際機場每天平均處理 NUMBER_SLOT航班，92,630名乘客，以及超過7,734噸的貨物。

在首11個月的2004年L O，C A T I O 4 N _ S L O，牛逼 的國際赤角機場每天處理的 592航班，NUMBER_SLOT人次，超過7734噸貨物的平均值。

在首11個月的2004年L O，C A T I O 4 N _ S L O，牛逼 的國際機場在赤角每天處理的 592航班，92630人次，超過NUMBER_SLOT噸貨物的平均值。

這可以被擴展到用於包含任意數目的空格的位置和數字工作。

sample = "In the first 11 months of 2004 Hong Kong Central 's international airport at Chek Lap Kok handled daily an average of 592 flights , 92 630 passengers , and more than 7 734 tons of cargo." 

tokenIDs2number = {(22,22): '592', (25,26): '92 630',(32,33): '7 734'} 
tokenIDs2location = {(7,9): 'Hong Kong Central'} 

for locationTokenIDs, location in tokenIDs2location.items(): 
    for numberTokenIDs, number in tokenIDs2number.items():  
     finalTokens = sample.split() 
     finalTokens[numberTokenIDs[0]:(numberTokenIDs[1]+1)] = "NUMBER_SLOT" 
     finalTokens[locationTokenIDs[0]:(locationTokenIDs[1]+1)] = "LOCATION_SLOT" 
     slotSentence = (" ").join(finalTokens) 
     print(slotSentence)

輸出：：

在頭11個月2004我們通過具有兩個numberTokenIDs和locationTokenIDs是2長度元組指定一個範圍的令牌的每個位置/數實現此** LOCATION _ SLOT **位於赤Kok角的國際機場平均每天處理592 航班，** NUMBER _ SLOT **旅客和7 734噸以上的貨物。

在首11個月2004 **位置_ SLOT **的赤鱲角國際每天機場處理的592個航班，92 630人次的平均，而且比** NUMBER _ SLOT更多**噸貨物。

2004年頭11個月** LOCATION _ SLOT **位於赤Kok角的國際機場每天平均處理** NU MBER _ SLOT **航班，92 630乘客以及超過7 734 噸貨物。

來源

2016-08-14 16:08:40

這是一個很好的答案，邏輯上合理，您能解釋爲什麼位置插槽被空白分隔嗎？另外我怎樣才能使這個通用的（有時插槽跨越不止兩個空間，例如像「剛果民主共和國」的國家，也可能有多個插槽的數字不僅僅是位置。正在使用'len（locationTokenIDs ）''但是我沒有掩蓋必要的國家 –

這適用於具有任意數量空格的國家，因爲locationTokenIDs中的值代表切片端點並在代碼中被視爲這樣。我更新了我的答案，代碼適用於具有任意數量空格的位置和數字 –

我剛調整了你的代碼，但不幸的是，這不允許我在單獨的'sentenceDicts'中添加多個槽句子的例子。我還必須包含一個if語句，比如'if len（numberTokenIDs）> 1： finalTokens [numberTokenIDs [0] :(numberTokenIDs [1] +1）] =「NUMBER_SLOT」 else： finalTokens [numberTokenID] =「NUMBER_SLOT」 –

考慮使用str.replace()而不是分割和切分句子串。爲此，您需要將tokenID2number中的元素與千位分隔符進行轉換，作爲@JonClements註釋可以使用Python 2的format(int, ',')進行處理。7+：

sample = "In the first 11 months of 2004 Hong Kong 's international airport " + \ 
     "at Chek Lap Kok handled daily an average of 592 flights " + \ 
     "92,630 passengers , and more than 7,734 tons of cargo."  
tokenIDs2number = {(22,): 592, (25,): 92630,(34,): 7734} 
tokenIDs2location = {(8,9): 'Hong Kong'} 

sentenceList = [] 
# ITERATE ACROSS A LIST COMPREHENSION FOR ALL POSSIBLE COMBINATIONS 
for item in [[s,i,j] for s in [sample] \ 
        for i in tokenIDs2location.items() \ 
        for j in tokenIDs2number.items()]: 
    sentenceDict = {} 
    sentenceDict["sentence"] = item[0] 
    sentenceDict["location-value-pair"] = {item[1][1]: item[2][1]} 
    sentenceDict["parsedSentence"] = sample.replace(item[1][1], 'LOCATION_SLOT').\ 
              replace(format(item[2][1], ','), 'NUMBER_SLOT') 
    sentenceList.append(sentenceDict)

輸出

[{'sentence': "In the first 11 months of 2004 Hong Kong 's international airport at Chek Lap Kok handled daily an average of 592 flights 92,630 passengers , and more than 7,734 tons of cargo.", 'parsedSentence': "In the first 11 months of 2004 LOCATION_SLOT 's international airport at Chek Lap Kok handled daily an average of 592 flights 92,630 passengers , and more than NUMBER_SLOT tons of cargo.", 'location-value-pair': {'Hong Kong': 7734}} 
{'sentence': "In the first 11 months of 2004 Hong Kong 's international airport at Chek Lap Kok handled daily an average of 592 flights 92,630 passengers , and more than 7,734 tons of cargo.", 'parsedSentence': "In the first 11 months of 2004 LOCATION_SLOT 's international airport at Chek Lap Kok handled daily an average of NUMBER_SLOT flights 92,630 passengers , and more than 7,734 tons of cargo.", 'location-value-pair': {'Hong Kong': 592}} 
{'sentence': "In the first 11 months of 2004 Hong Kong 's international airport at Chek Lap Kok handled daily an average of 592 flights 92,630 passengers , and more than 7,734 tons of cargo.", 'parsedSentence': "In the first 11 months of 2004 LOCATION_SLOT 's international airport at Chek Lap Kok handled daily an average of 592 flights NUMBER_SLOT passengers , and more than 7,734 tons of cargo.", 'location-value-pair': {'Hong Kong': 92630}}]

來源

2016-08-14 16:16:17 Parfait

雖然它很好，但是你相信Mike DeSimone的配方......對於2.7+你現在可以寫成'format（int_value，'，'）'... –

@JonClements這意味着我可以替換replace（intWithCommas（（[item] [2] [1]），'NUMBER_SLOT'）''替換（format（item [2] [1]，'，'），'NUMBER_SLOT'）'？ –

@DhruvGhulati yes ... –

條件列表修真元組時長超過一個

回答

相關問題