2017-01-24 42 views
-1

我有一個文本文件,像這樣的10K字的列表:AttributeError的:「名單」對象有沒有屬性「低」 gensim

G15 KDN C30A 行動標準 噴筆 空氣稀釋

我想將它們轉換爲使用後續處理此代碼GenSim下套管令牌:

data = [line.strip() for line in open("C:\corpus\TermList.txt", 'r')] 
texts = [[word for word in data.lower().split()] for word in data] 

,我也得到了followi ng回調:

AttributeErrorTraceback (most recent call last) 
<ipython-input-84-33bbe380449e> in <module>() 
     1 data = [line.strip() for line in open("C:\corpus\TermList.txt", 'r')] 
----> 2 texts = [[word for word in data.lower().split()] for word in data] 
     3 
AttributeError: 'list' object has no attribute 'lower' 

任何建議,我在做什麼錯,如何糾正它將不勝感激!謝謝!!

回答

4

嘗試:

data = [line.strip() for line in open("C:\corpus\TermList.txt", 'r')] 
texts = [[word.lower() for word in text.split()] for text in data] 

你想)申請.lower(數據,這是一個列表。
.lower()只能應用於字符串。

+0

謝謝!!!它工作完美。現在我明白我做錯了什麼。我是python的新手。 – tom

+0

np隊友,別忘了upvote/mark回答:) – epattaro

1

你需要

texts = [[word.lower() for word in line.split()] for line in data] 

data[... for line in data])代碼爲每line生成([word.lower() for word in line.split()])的小寫字母單詞的列表。每個str line將包含一系列空格分隔的單詞。 line.split()將把這個序列變成列表。而word.lower()會將每個單詞轉換爲小寫。

0

你在做什麼錯的是,呼籲列表的字符串方法(lower())(在你的情況下,數據)

data = [line.strip() for line in open('corpus.txt', 'r')] 

讓行作爲列表條目後,你應該做的是

texts = [[words for words in sentences.lower().split()] for sentences in data] 
#^^^^^^^^^^^^^^^^^^^^^^^^^^^^^*********^^^^^^^^^^^^^^^^^^^^^^*********^^^^ 
#you should call lower on iter. value - in our case it is "sentences" 

這將給你列表的列表。每個列表都包含小寫單詞表單行。

$ tail -n 10 corpus.txt 
G15 KDN C30A Action Standard Air Brush Air Dilution 
G15 KDN C30A Action Standard Air Brush Air Dilution 
G15 KDN C30A Action Standard Air Brush Air Dilution 
G15 KDN C30A Action Standard Air Brush Air Dilution 
G15 KDN C30A Action Standard Air Brush Air Dilution 
G15 KDN C30A Action Standard Air Brush Air Dilution 
G15 KDN C30A Action Standard Air Brush Air Dilution 
G15 KDN C30A Action Standard Air Brush Air Dilution 
G15 KDN C30A Action Standard Air Brush Air Dilution 
G15 KDN C30A Action Standard Air Brush Air Dilution 


$ python 
Python 2.7.12 (default, Nov 19 2016, 06:48:10) 
[GCC 5.4.0 20160609] on linux2 
Type "help", "copyright", "credits" or "license" for more information. 
>>> data = [line.strip() for line in open('corpus.txt', 'r')] 
>>> texts = [[words for words in sentences.lower().split()] for sentences in data] 
>>> texts[:5] 
[['g15', 'kdn', 'c30a', 'action', 'standard', 'air', 'brush', 'air', 'dilution'], ['g15', 'kdn', 'c30a', 'action', 'standard', 'air', 'brush', 'air', 'dilution'], ['g15', 'kdn', 'c30a', 'action', 'standard', 'air', 'brush', 'air', 'dilution'], ['g15', 'kdn', 'c30a', 'action', 'standard', 'air', 'brush', 'air', 'dilution'], ['g15', 'kdn', 'c30a', 'action', 'standard', 'air', 'brush', 'air', 'dilution']] 
>>> 

確定您可以平放或保持原樣。

>>> flattened = reduce(lambda x,y: x+y, texts) 
>>> flattened[:30] 
['g15', 'kdn', 'c30a', 'action', 'standard', 'air', 'brush', 'air', 'dilution', 'g15', 'kdn', 'c30a', 'action', 'standard', 'air', 'brush', 'air', 'dilution', 'g15', 'kdn', 'c30a', 'action', 'standard', 'air', 'brush', 'air', 'dilution', 'g15', 'kdn', 'c30a'] 
>>> 
相關問題