2016-06-09 158 views
1

請在這裏找到關於hackerrankNLP-POS挑戰

原來的問題。雖然,我的解決方案是不完整,就會有人請幫助我理解我要去哪裏錯了嗎? (在第二功能雖然問題問的3個字母的標籤。感謝惡搞返回2個字母的標籤!

import re 
import nltk 
import string 
final_tagged = "" 
raw_input(strs) 
def tokenize_two(i): 
    temp = i 
    global strs 
    "remove /?? and pos tag" 
    for ch in ['/??']: 
     if ch in i: 
      i=i.replace(ch,"") 
      #pos tagging 
    tag = nltk.pos_tag([i]) 
    for item in tag: 
     for ch in ['??']: 
      if ch in temp: 
       temp = temp.replace(ch,item[1]) 
    replace = i+"/??" 
    strs = string.replace(strs,replace,temp) 
    return temp; 

def tokenize_three(i): 
    "remove /??? and pos tag" 
    temp = i 
    global strs 
    for ch in ['/???']: 
     if ch in i: 
      i=i.replace(ch,"") 
    tag = nltk.pos_tag([i]) 
    for item in tag: 
     for ch in ['???']: 
      if ch in temp: 
       temp = temp.replace(ch,item[1]) 
    replace = i+"/???" 
    strs = string.replace(strs,replace,temp) 
    return temp; 

a = [w for w in re.split('\s+',strs)] 
for i in a : 
    if(i.endswith("/??")): 
     tagged = tokenize_two(i) 
    if(i.endswith("/???")): 
     final_tagged = tokenize_three(i) 
print strs 

回答

1
tag = nltk.pos_tag([i]) 

詞性標註是依賴於上下文,你需要通過整個標記化句子作爲pos_tag的參數,而不是每個未知單詞都要撥打pos_tag一次。

+0

我不知道我是怎麼錯過的,謝謝。 – CyberDuck