2017-10-13 151 views
-1

在Python 3.6中,我有一個像下面這樣的列表,並且無法弄清楚如何正確搜索這些值。所以,如果我給了下面的搜索字符串,我需要搜索標題和標籤的值以及哪個匹配最多的值,我會返回id,如果有相同數量的許多不同圖像(id)的比賽,那麼標題首先按字母順序排列的人將被退回。另外,它應該不是區分大小寫的。所以在代碼中,我有搜索作爲我的術語來搜索,它應該返回第一個id值,而是返回不同的值。如何用Python中的字典搜索嵌套列表?

image_info = [ 
{ 
    "id" : "34694102243_3370955cf9_z", 
    "title" : "Eastern", 
    "flickr_user" : "Sean Davis", 
    "tags" : ["Los Angeles", "California", "building"] 
}, 
{ 
    "id" : "37198655640_b64940bd52_z", 
    "title" : "Spreetunnel", 
    "flickr_user" : "Jens-Olaf Walter", 
    "tags" : ["Berlin", "Germany", "tunnel", "ceiling"] 
}, 
{ 
    "id" : "34944112220_de5c2684e7_z", 
    "title" : "View from our rental", 
    "flickr_user" : "Doug Finney", 
    "tags" : ["Mexico", "ocean", "beach", "palm"] 
}, 
{ 
    "id" : "36140096743_df8ef41874_z", 
    "title" : "Someday", 
    "flickr_user" : "Thomas Hawk", 
    "tags" : ["Los Angeles", "Hollywood", "California", "Volkswagen", "Beatle", "car"] 
} 

]

my_counter = 0 
search = "CAT IN BUILding" 
search = search.lower().split() 
matches = {} 

for image in image_info: 
    for word in search: 
     word = word.lower() 
     if word in image["title"].lower().split(" "): 
      my_counter += 1 
      print(my_counter) 
     if word in image["tags"]: 
      my_counter +=1 
      print(my_counter) 
    if my_counter > 0: 
     matches[image["id"]] = my_counter 
     my_counter = 0 
+0

什麼,當你說「返回」你的意思是?你沒有返回任何東西?你的預期產出是什麼,它與你擁有的產品有什麼不同?你能更明確嗎? –

+0

我運行了你的代碼,它給了我匹配詞典中的第一個ID。但是,標籤存在一個錯誤。您將搜索字符串中的單詞縮寫爲小寫,而不是標記中的單詞,但標記包含一些大寫的單詞。例如,你將無法匹配洛杉磯。 – bouma

+0

@ juanpa.arrivillaga因此,我使用搜索項「CAT IN BUILTING」來搜索列表/字典中的標題和標記的值,並且我希望函數返回找到的匹配項。因此,對於「CAT IN BUILTING」,它應該返回1,並在34694102243_3370955cf9_z找到匹配的ID。如果搜索詞是「在墨西哥海灘建造」,那麼它應該返回34944112220_de5c2684e7_z,因爲它在標籤中有2個匹配項。 – Gray

回答

0

這是一種代碼的變體,我試圖在搜索前預先對數據進行索引。這是一個非常基本的實現如何CloudSearchElasticSearch會索引和搜索

import itertools 
from collections import Counter 
image_info = [ 
{ 
    "id" : "34694102243_3370955cf9_z", 
    "title" : "Eastern", 
    "flickr_user" : "Sean Davis", 
    "tags" : ["Los Angeles", "California", "building"] 
}, 
{ 
    "id" : "37198655640_b64940bd52_z", 
    "title" : "Spreetunnel", 
    "flickr_user" : "Jens-Olaf Walter", 
    "tags" : ["Berlin", "Germany", "tunnel", "ceiling"] 
}, 
{ 
    "id" : "34944112220_de5c2684e7_z", 
    "title" : "View from our rental", 
    "flickr_user" : "Doug Finney", 
    "tags" : ["Mexico", "ocean", "beach", "palm"] 
}, 
{ 
    "id" : "36140096743_df8ef41874_z", 
    "title" : "Someday", 
    "flickr_user" : "Thomas Hawk", 
    "tags" : ["Los Angeles", "Hollywood", "California", "Volkswagen", "Beatle", "car"] 
} 
] 

my_counter = 0 
search = "CAT IN BUILding california" 
search = set(search.lower().split()) 
matches = {} 

index = {} 


# Building a rudimentary search index 
for info in image_info: 
    bag = info["title"].lower().split(" ") 
    tags = [t.lower().split(" ") for t in info["tags"]] # we want to be able to hit "los angeles" as will as "los" and "angeles" 
    tags = list(itertools.chain.from_iterable(tags)) 
    for k in (bag + tags): 
     if k in index: 
      index[k].append(info["id"]) 
     else: 
      index[k] = [info["id"]] 

#print(index) 

hits = [] 

for s in search: 
    if s in index: 
     hits += index[s] 
print(Counter(hits).most_common(1)[0][0]) 
+0

如果我嘗試運行你提供的代碼,我得到錯誤:TypeError:append()只需要一個參數(給定3)。 – Gray

+0

謝謝@Mahi。我已更改代碼來解決問題。 – djinn

+0

謝謝,這工作。但是,我有一個問題。現在它正在輸出所有圖像id和它的命中數量,但是如何才能打印出只有最大命中數量的圖像id而不是所有命中的圖像ID? – Gray

0

您正在創建詞典匹配新條目[圖片[ 「ID」] = my_counter。 如果您想在該字典中只保留1個條目,並且您希望image_id和count。我修改了你的字典和條件。希望能幫助到你。

my_counter = 0 
search_term = "CAT IN BUILding" 
search = search_term.lower().split() 
matches = {} 
matches[search_term] = {} 

for image in image_info: 
    for word in search: 
     word = word.lower() 
     if word in image["title"].lower().split(" "): 
      my_counter += 1 
      print(my_counter) 
     if word in image["tags"]: 
      my_counter +=1 
      print(my_counter) 
    if my_counter > 0: 
     if not matches[search_term].values() or my_counter > matches[search_term].values()[0]: 
      matches[search_term][image["id"]] = my_counter 

     my_counter = 0 
+0

我試着運行你修改過的代碼,現在得到錯誤:TypeError:' dict_values的對象不支持索引 – Gray

+0

Python 3.4在執行dict.values()時返回dict_values()而不是列表。只需將list()放在匹配[search_term] .values()周圍。它應該像列表一樣(匹配[search_term] .values())[0] –

+0

也可以使用小寫列表標記,如上面的一個用戶突出顯示的那樣。 –