你在做什麼錯的是,呼籲列表的字符串方法(lower()
)(在你的情況下,數據)
data = [line.strip() for line in open('corpus.txt', 'r')]
讓行作爲列表條目後,你應該做的是
texts = [[words for words in sentences.lower().split()] for sentences in data]
#^^^^^^^^^^^^^^^^^^^^^^^^^^^^^*********^^^^^^^^^^^^^^^^^^^^^^*********^^^^
#you should call lower on iter. value - in our case it is "sentences"
這將給你列表的列表。每個列表都包含小寫單詞表單行。
$ tail -n 10 corpus.txt
G15 KDN C30A Action Standard Air Brush Air Dilution
G15 KDN C30A Action Standard Air Brush Air Dilution
G15 KDN C30A Action Standard Air Brush Air Dilution
G15 KDN C30A Action Standard Air Brush Air Dilution
G15 KDN C30A Action Standard Air Brush Air Dilution
G15 KDN C30A Action Standard Air Brush Air Dilution
G15 KDN C30A Action Standard Air Brush Air Dilution
G15 KDN C30A Action Standard Air Brush Air Dilution
G15 KDN C30A Action Standard Air Brush Air Dilution
G15 KDN C30A Action Standard Air Brush Air Dilution
$ python
Python 2.7.12 (default, Nov 19 2016, 06:48:10)
[GCC 5.4.0 20160609] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> data = [line.strip() for line in open('corpus.txt', 'r')]
>>> texts = [[words for words in sentences.lower().split()] for sentences in data]
>>> texts[:5]
[['g15', 'kdn', 'c30a', 'action', 'standard', 'air', 'brush', 'air', 'dilution'], ['g15', 'kdn', 'c30a', 'action', 'standard', 'air', 'brush', 'air', 'dilution'], ['g15', 'kdn', 'c30a', 'action', 'standard', 'air', 'brush', 'air', 'dilution'], ['g15', 'kdn', 'c30a', 'action', 'standard', 'air', 'brush', 'air', 'dilution'], ['g15', 'kdn', 'c30a', 'action', 'standard', 'air', 'brush', 'air', 'dilution']]
>>>
確定您可以平放或保持原樣。
>>> flattened = reduce(lambda x,y: x+y, texts)
>>> flattened[:30]
['g15', 'kdn', 'c30a', 'action', 'standard', 'air', 'brush', 'air', 'dilution', 'g15', 'kdn', 'c30a', 'action', 'standard', 'air', 'brush', 'air', 'dilution', 'g15', 'kdn', 'c30a', 'action', 'standard', 'air', 'brush', 'air', 'dilution', 'g15', 'kdn', 'c30a']
>>>
謝謝!!!它工作完美。現在我明白我做錯了什麼。我是python的新手。 – tom
np隊友,別忘了upvote/mark回答:) – epattaro