需要從兩個列表，並在同一時間

元組值對的字典中刪除的項目這是一個previous question非常相關的，但我意識到，我的目標是要複雜得多：需要從兩個列表，並在同一時間

我有一句話："Forbes Asia 200 Best Under 500 Billion 2011"

我有這樣的標記：

oldTokens = [u'Forbes', u'Asia', u'200', u'Best', u'Under', u'500', u'Billion', u'2011']

和以往進行的解析器已經想通了，那裏應該是位置或時隙數的指標：

numberTokenIDs = {(7,): 2011.0, (2,): 200.0, (5,6): 500000000000.00} 
locationTokenIDs = {(0, 1): u'Forbes Asia'}

令牌ID對應於其中存在的位置或數量的令牌的索引，目的是獲得一組新的樣令牌：

newTokens = [u'Asia', u'200', u'Best', u'Under', u'500', u'2011']

用新的數量和位置tokenIDs或許象（爲了避免指數越界例外）：

numberTokenIDs = {(5,): 2011.0, (1,): 200.0, (4,): 500000000000.00} 
locationTokenIDs = {(0,): u'Forbes Asia'}

基本上我想經過標記的新的，減小集，並能夠最終建立一個所謂的新句子：

"LOCATION_SLOT NUMBER_SLOT Best Under NUMBER_SLOT NUMBER_SLOT"

通過經歷新的令牌集並用「LOCATION_SLOT」或「NUMBER_SLOT」替換正確的tokenID。如果我這樣做是與當前設定的數量和位置標記ID的，我會得到：

"LOCATION_SLOT LOCATION_SLOT NUMBER_SLOT Best Under NUMBER_SLOT NUMBER_SLOT NUMBER_SLOT".

我將如何做到這一點？

另一個例子是：

Location token IDs are: (0, 1) 
Number token IDs are: (3, 4) 
Old sampleTokens [u'United', u'Kingdom', u'USD', u'1.240', u'billion']

我想要的都刪除標記，改變位置和數量令牌的ID，以便能夠更換一句話：

sampleTokens[numberTokenID] = "NUMBER_SLOT" 
sampleTokens[locationTokenID] = "LOCATION_SLOT"

使得更換令牌[u'LOCATION_SLOT', u'USD', u'NUMBER_SLOT']

來源

2016-07-21 Dhruv Ghulati

不是很優雅，但工作液：

oldTokens = [u'Forbes', u'Asia', u'200', u'Best', u'Under', u'500', u'Billion', u'2011'] 

numberTokenIDs = {(7,): 2011.0, (2,): 200.0, (5,6): 500000000000.00} 
locationTokenIDs = {(0, 1): u'Forbes Asia'} 

newTokens = [] 
newnumberTokenIDs = {} 
newlocationTokenIDs = {} 

new_ind = 0 
skip = False 

for ind in range(len(oldTokens)): 
    if skip: 
     skip=False 
     continue 

    for loc_ind in locationTokenIDs.keys(): 
     if ind in loc_ind: 
      newTokens.append(oldTokens[ind+1]) 
      newlocationTokenIDs[(new_ind,)] = locationTokenIDs[loc_ind] 
      new_ind += 1 
      if len(loc_ind) > 1: # Skip next position if there are 2 elements in a tuple 
       skip = True 
      break 
    else: 
     for num_ind in numberTokenIDs.keys(): 
      if ind in num_ind: 
       newTokens.append(oldTokens[ind]) 
       newnumberTokenIDs[(new_ind,)] = numberTokenIDs[num_ind] 
       new_ind += 1 
       if len(num_ind) > 1: 
        skip = True 
       break 
     else: 
      newTokens.append(oldTokens[ind]) 
      new_ind += 1 

newTokens 
Out[37]: [u'Asia', u'200', u'Best', u'Under', u'500', u'2011'] 

newnumberTokenIDs 
Out[38]: {(1,): 200.0, (4,): 500000000000.0, (5,): 2011.0} 

newlocationTokenIDs 
Out[39]: {(0,): u'Forbes Asia'}

來源

2016-07-21 18:55:32

嗨瓦迪姆，我無法適應這個創建一個像這樣的句子：'福布斯亞洲200最佳500億以下2011'，所以我連接位置和值，但我也不希望這樣做，如果有不需要例如如果位置令牌ID和數字令牌ID對的長度不超過1。 –

更簡單的方法是爲程序添加索引列表以知道哪些地方的ID將被連接。然後只需添加額外的檢查：如果索引在列表中 - 進行連接。（2，）：200.0，（5,6）：500000000000.00'到[[...]（2，）：（200.0,0）， 6）：（500000000000.00,1）'。 –

瞭解。我應該就這個問題提出一個單獨的問題，或者你可以把它作爲你答案的第二個版本嗎？ –

需要從兩個列表，並在同一時間

回答

相關問題