2017-07-07 100 views
-4

我有一個這樣的名單:從上面的列表我刪除停用詞,並得到這個 從列表中提取整數

fmt_string="I am a smoker male of 25 years who wants a policy for 30 
yrs with a sum assured amount of 1000000 rupees" 

現在我有一個列表如下:

['smoker', 'male', '25', 'years', 'wants', 'policy', '30', 'yrs', 
'sum', 'assured', 'amount', '1000000', 'rupees'] 

從這個列表我想提取25,30和1000000,但代碼應該是像25年之前或之後的東西。 30既可以是策略後到1000000可以在任何位置

最後輸出應該是這樣的:

'1000000 30 25 male smoker' 

我只想要一個強大的代碼,只要我找到這些值我回到了我這樣一個列表。

+1

請寫下你的代碼。 – CunivL

+1

你有這個問題的嘗試嗎? – citizen2077

+0

爲了過濾字符串,只獲得整數值,您可以使用該行的代碼:'integer_values = [E爲電子在fmt_string.split()如果isinstance(即INT)]' – CunivL

回答

-1

使用re在列表findall occurances和join用逗號split此列出並適用reverse()列出,然後join ' '再次

數據:

li = ['smoker', 'male', '25', 'years', 'wants', 'policy', '30', 'yrs', 'sum', 'assured', 'amount', '1000000', 'rupees'] 

temp=",".join([l for l in li if re.findall('1000000|30|25|male|smoker',l)]).split(",") 

temp.reverse() 
temp = " ".join(temp) 

輸出:

'1000000 30 25 male smoker' 

希望這個答案有幫助。

+0

這個問題與'print'1000000 30 25男性吸菸者一樣普遍。 – lenz

0

這應該是有幫助的

import re 
# Variation in places of the numbers in strings: 
str1 = "I am a smoker male of 25 years who wants a policy for 30 yrs with a sum assured amount of 1000000 rupees" 
str2 = "I am a smoker male of 25 years who wants a for 30 policy yrs with a sum assured amount of 1000000 rupees" 
str3 = "I am a smoker male of years 25 who wants a for 30 policy yrs with a sum assured amount of 1000000 rupees" 
str4 = "I am a smoker male of 25 years who wants a for 30 policy yrs with a sum assured amount of 1000000 rupees" 

regex = r".*?(((\d{2})\s?years)|(years\s?(\d{2}))).*(policy.*?(\d{2})|(\d{2}).*?policy).*(\d{7}).*$" 
replacements = r"\9 \7 \8 \3 \5" 

res_str1 = re.sub(regex, replacements, str1) 
res_str2 = re.sub(regex, replacements, str2) 
res_str3 = re.sub(regex, replacements, str3) 
res_str4 = re.sub(regex, replacements, str4) 


def clean_spaces(string): 
    return re.sub(r"\s{1,2}", ' ', string) 


print(clean_spaces(res_str1)) 
print(clean_spaces(res_str2)) 
print(clean_spaces(res_str3)) 
print(clean_spaces(res_str4)) 

輸出:

1000000 30 25 
1000000 30 25 
1000000 30 25 
1000000 30 25 

更新

以上正則表達式有一些錯誤。當我試圖改善它,我注意到,這是低效的和醜陋的,因爲它每一次解析每一個單個字符。如果我們堅持原來的解析單詞的方法,我們可以做得更好。所以我的新的解決方案是:

# Algorithm 
# for each_word in the_list: 
#  maintain a pre_list of terms that come before a number 
#  if each_word is number: 
#   if there is any element of desired_terms_list exists in pre_list: 
#    pair the number & the desired_term and insert into the_dictionary 
#    remove this desired_term from desired_terms_list 
#    reset the pre_list 
#   else: 
#    put the number in number_at_hand 
#  else: 
#   if no number_at_hand: 
#    add the current word into pre_list 
#   else: 
#    if the current_word an element of desired_terms_list: 
#     pair the number & the desired_term and insert into the_dictionary 
#     remove this desired_term from desired_terms_list 
#     reset number_at_hand 

代碼:

from pprint import pprint 


class Extractor: 
    def __init__(self, search_terms, string): 
     self.pre_list = list() 
     self.list = string.split() 
     self.terms_to_look_for = search_terms 
     self.dictionary = {} 

    @staticmethod 
    def is_number(string): 
     try: 
      int(string) 
      return True 
     except ValueError: 
      return False 

    def check_pre_list(self): 
     for term in self.terms_to_look_for: 
      if term in self.pre_list: 
       return term 
      else: 
       return None 

    def extract(self): 
     number_at_hand = str() 
     for word in self.list: 
      if Extractor.is_number(word): 
       check_result = self.check_pre_list() 
       if check_result is not None: 
        self.dictionary[check_result] = word 
        self.terms_to_look_for.remove(check_result) 
        self.pre_list = list() 
       else: 
        number_at_hand = word 
      else: 
       if number_at_hand == '': 
        self.pre_list.append(word) 
       else: 
        if word in self.terms_to_look_for: 
         self.dictionary[word] = number_at_hand 
         self.terms_to_look_for.remove(word) 
         number_at_hand = str() 
     return self.dictionary 

用法:

ex1 = Extractor(['years', 'policy', 'amount'], 
       'I am a smoker male of 25 years who wants a policy for 30 yrs with a sum assured amount of 1000000 rupees') 
ex2 = Extractor(['years', 'policy', 'amount'], 
       'I am a smoker male of 25 years who wants a for 30 yrs policy with a sum assured amount of 1000000 rupees') 
ex3 = Extractor(['years', 'policy', 'amount'], 
       'I am a smoker male of years 25 who wants a policy for 30 yrs with a sum assured amount of 1000000 rupees') 
ex4 = Extractor(['years', 'policy', 'amount'], 
       'I am a smoker male of years 25 who wants a for 30 yrs policy with a sum assured amount of 1000000 rupees') 
pprint(ex1.extract()) 
pprint(ex2.extract()) 
pprint(ex3.extract()) 
pprint(ex4.extract()) 

輸出:

{'amount': '1000000', 'policy': '30', 'years': '25'} 
{'amount': '1000000', 'policy': '30', 'years': '25'} 
{'amount': '1000000', 'policy': '30', 'years': '25'} 
{'amount': '1000000', 'policy': '30', 'years': '25'} 

我期待更好的表現了。

+0

它拋出的錯誤爲:TypeError:'str'對象不可調用 –

+0

它是(不是正則表達式)顯示相同輸入的錯誤嗎?我得到錯誤! – arif

+0

它顯示了以下錯誤 –