Python：如何從文件標記化？

我在Python中的新手。我想知道如何標記來自文件的twitter數據。Python：如何從文件標記化？

我的代碼是：

with codecs.open('example.csv', 'r',"utf-8") as f: 
    for line in f: 
     tweet = f.readlines() 
     tokens = word_tokenize(tweet["text"]) 
     print(tokens)

但是錯誤：

--------------------------------------------------------------------------- 
TypeError         Traceback (most recent call last) 
<ipython-input-15-98b6d77c2fcf> in <module>() 
     2  for line in f: 
     3   tweet = f.readlines() 
----> 4   tokens = word_tokenize(tweet["text"]) 
     5   print(tokens) 

TypeError: list indices must be integers or slices, not str

如何提高我的代碼？

來源

2017-11-11 Zayajung C

是什麼'word_tokenize'？而且'tweet'是'list'，訪問列表項，你應該與指數引用它們（如錯誤說） – Arman

我認爲nltk.word_tokenize？ – coffeemakr

從代碼word_tokenize：從pythainlp.tokenize導入word_tokenize，我想從example.csv收集文本到鳴叫 –

UPDATE：第一

OK，第一件事情...我用this（帶鳴叫sample.csv）文件，以便做我的測試。那麼這裏是一個簡單的代碼按你的例子：

import codecs 
import nltk 

nltk.download('punkt') 

with codecs.open('example.csv', 'r') as f: 
    for line in f: 
     tweet = f.readlines() 

     tokenized_sents = [nltk.word_tokenize(i) for i in tweet] 
     for i in tokenized_sents: 
      print(i)

這是測試，並按照截圖工作：

hmmmm ....你逝去的字符串tweet [「text」]）在word_tokenize中接受一個參數爲Integer。

它應該像

with codecs.open('example.csv', 'r',"utf-8") as f: 
    for line in f: 
     tweet = f.readlines() 

     tokenized_sents = [word_tokenize(i) for i in tweet] 
     for i in tokenized_sents: 
      print i

來源

2017-11-11 17:05:11 oetoni

謝謝奧托尼，它顯示：文件「」，第7行 print i ^ 語法錯誤：在調用「打印」 –

這裏...完整的工作代碼與截屏:)和文件，我用 – oetoni

謝謝你這麼多Oetoni缺少括號。有用！！！！！！和抱歉，我不知道把（）上打印我hahahahahaaaaaaaa。 –

如果您遍歷行，你不必調用readlines方法：

with codecs.open('example.csv', 'r',"utf-8") as f: 
    for line in f: 
     # line is each line

如果你想讀的列「鳴叫」使用CSV這樣的：

import csv 
from nltk import word_tokenize 
with open('example.csv', 'r') as csvfile: 
    reader = csv.DictReader(csvfile) 
    for row in reader: 
     tweet = row["tweet"] 
     print("Tweet: %s" % tweet) 
     tokens = word_tokenize(tweet) 
     print(tokens)

見的Python 3 CSV module和文檔0。

來源

2017-11-11 17:10:31 coffeemakr

謝謝coffeemakr，我的csv文件名爲「text」的列，因此我從您的代碼行[「text」]更改並且出現類似如下的錯誤 - -------------------------------------------------- ------------------------ KeyError異常回溯（最近最後調用）在（） 2讀寫器= csv。DictReader（F） 3在讀取器行： ----> 4鳴叫=行[ 「文本」] 5打印（「鳴叫：％S」％鳴叫） 6令牌= word_tokenize（鳴叫） KeyError異常：'text' –

這意味着DictReader沒有找到字段「text」。打印'row'可以幫助你調試你的問題。另外檢查CSV是否真的有一個「文本」列。 – coffeemakr

謝謝coffeemakr。 –

Python：如何從文件標記化？

回答

相關問題