Python的 - 組順序的數組成員

我想修改我的文字是這樣的：Python的 - 組順序的數組成員

arr = [] 
# arr is full of tokenized words from my text

例如：

"Abraham Lincoln Hotel is very beautiful place and i want to go there with 
Barbara Palvin. Also there are stores like Adidas ,Nike , Reebok."

編輯：基本上，我想用istitle檢測正確的名稱，並將組（）和isAlpha（）for for語句如：

for i in arr: 
    if arr[i].istitle() and arr[i].isAlpha

在示例arr中，直到下一個單詞不是他的第一個單詞字母大寫。

arr[0] + arr[1] + arr[2] = arr[0] 
#Abraham Lincoln Hotel

這就是我要與我的新編曲：

['Abraham Lincoln Hotel'] is very beautiful place and i want to go there with['Barbara Palvin']. ['Also'] there are stores like ['Adidas'], ['Nike'],['Reebok'].

「也」不是我的問題，將是有益的，當我嘗試，以配合我的數據集。

來源

2016-04-18 Arda Nalbant

[發現使用NLTK WordNet的專有名詞]的可能的複製（http://stackoverflow.com/questions/17669952/finding-proper-nouns-using-nltk-wordnet） – Selcuk

我想要一個基本的Python代碼，這總是返回專有名稱，而不分組他們，但無論如何感謝。 –

你不能做一個*基本的Python代碼*來返回專有名稱。這並不容易，你需要使用'NTLK'來實現它。 –

你可以做這樣的事情：

sentence = "Abraham Lincoln Hotel is very beautiful place and i want to go there with Barbara Palvin. Also there are stores like Adidas, Nike, Reebok." 
all_words = sentence.split() 
last_word_index = -100 
proper_nouns = [] 
for idx, word in enumerate(all_words): 
    if(word.istitle() and word.isalpha()): 
     if(last_word_index == idx-1): 
      proper_nouns[-1] = proper_nouns[-1] + " " + word 
     else: 
      proper_nouns.append(word) 
     last_word_index = idx 
print(proper_nouns)

這段代碼：

拆分所有的單詞放入一個列表
遍歷所有的單詞和
- 如果最後一個大寫的字是以前的話，它會追加它到列表中的最後一個條目
- 否則它會將該詞作爲新條目存儲在列表中
- 記錄最後一個索引alized字被發現

來源

2016-04-18 09:15:42 arbylee

此輸出'[ '林肯酒店'， '巴巴拉'， '還']'，不'[ '亞伯拉罕'， '林肯'， '酒店'， '巴巴拉'， 'Palvin。'， 'Adidas'， 'Nike'， 'Reebok。']' –

像「也」或「因爲」這樣的詞對我來說不會是問題，因爲它們不會與我後來充滿組織，位置和人名的數據集相匹配。所以像任何解決方案; ['亞伯拉罕林肯酒店']，['芭芭拉帕爾文']，['阿迪達斯']，['耐克']，['銳步']將是有用的。因爲以後我會把他們分組的單詞作爲輸入發送給我的功能。 –

您寫的代碼做了我想要的，但僅限於第一個字母。輸出是：['亞伯拉罕林肯酒店'，'芭芭拉'，'也'] –

這是你在問什麼？

sentence = "Abraham Lincoln Hotel is very beautiful place and i want to go there with Barbara Palvin. Also there are stores like Adidas ,Nike , Reebok." 

chars = ".!?,"         # Characters you want to remove from the words in the array 

table = chars.maketrans(chars, " " * len(chars)) # Create a table for replacing characters 
sentence = sentence.translate(table)    # Replace characters with spaces 

arr = sentence.split()       # Split the string into an array whereever a space occurs 

print(arr)

的輸出是：

['Abraham', 
'Lincoln', 
'Hotel', 
'is', 
'very', 
'beautiful', 
'place', 
'and', 
'i', 
'want', 
'to', 
'go', 
'there', 
'with', 
'Barbara', 
'Palvin', 
'Also', 
'there', 
'are', 
'stores', 
'like', 
'Adidas', 
'Nike', 
'Reebok']

注意這個代碼：即在chars變量的任何字符將來自陣列中的字符串被刪除。代碼中包含Explenation。

要卸下非名稱只是這樣做：

import string 
new_arr = [] 

for i in arr: 
    if i[0] in string.ascii_uppercase: 
     new_arr.append(i)

此代碼將包括以大寫字母開頭的所有單詞。

爲了解決這個問題，你需要改變chars到：

chars = ","

並更改上面的代碼：

import string 
new_arr = [] 
end = ".!?"  

b = 1 
for i in arr: 
    if i[0] in string.ascii_uppercase and arr[b-1][-1] not in end: 
     new_arr.append(i) 
    b += 1

，這將輸出：

['Abraham', 
'Lincoln', 
'Hotel', 
'Barbara', 
'Palvin.', 
'Adidas', 
'Nike', 
'Reebok.']

來源

2016-04-18 08:13:52 Janekmuric

這不是正確的方法。我的意思是，對於OP來說，這是不可能的，以便列出所有*不是專有名詞的單詞*。 –

已編輯。 @ArdaNalbant你應該找到更多適合或不適合你需要識別的名字的標準，以便程序更加精確。 – Janekmuric

輸出是我需要讓我試試。這裏的工作很好 –

Python的 - 組順序的數組成員

回答

相關問題