2017-06-04 71 views
1
from tweetpy import * 
import re 
import json 
from pprint import pprint 
import csv 

# Import the necessary methods from "twitter" library 
from twitter import Twitter, OAuth, TwitterHTTPError, TwitterStream 

# Variables that contains the user credentials to access Twitter API 
ACCESS_TOKEN = '' 
ACCESS_SECRET = '' 
CONSUMER_KEY = '' 
CONSUMER_SECRET = '' 

oauth = OAuth(ACCESS_TOKEN, ACCESS_SECRET, CONSUMER_KEY, CONSUMER_SECRET) 

# Initiate the connection to Twitter Streaming API 
twitter_stream = TwitterStream(auth=oauth) 

# Get a sample of the public data following through Twitter 
iterator = twitter_stream.statuses.filter(track="#kindle",language="en",replies="all") 
# Print each tweet in the stream to the screen 

# Here we set it to stop after getting 10000000 tweets. 
# You don't have to set it to stop, but can continue running 
# the Twitter API to collect data for days or even longer. 

tweet_count = 10000000 

file = "C:\\Users\\WELCOME\\Desktop\\twitterfeeds.csv" 
with open(file,"w") as csvfile: 
    fieldnames=['Username','Tweet','Timezone','Timestamp','Location'] 
    writer = csv.DictWriter(csvfile, fieldnames=fieldnames) 
    writer.writeheader() 
    for tweet in iterator: 
     #pprint(tweet) 
     username = str(tweet['user']['screen_name']) 
     tweet_text = str(tweet['text']) 
     user_timezone = str(tweet['user']['time_zone']) 
     tweet_timestamp=str(tweet['created_at']) 
     user_location = str(tweet['user']['location']) 
     print tweet 
     tweet_count -= 1 
     writer.writerow({'Username':username,'Tweet':tweet_text,'Timezone':user_timezone,'Location':user_location,'Timestamp':tweet_timestamp}) 

     if tweet_count <= 0: 
      break 

我想寫鳴叫與列'username''Tweet''Timezone''Location',並且'Timestamp' csv文件。獲得編碼錯誤,同時將數據寫入csv文件

但我收到以下錯誤:

tweet_text = str(tweet['text']) 
UnicodeEncodeError: 'ascii' codec can't encode character u'\u2026' in position 139: ordinal not in range(128). 

我知道它是編碼的問題,但我不知道該變量編碼的確切位置。

+0

你想對違規字符做什麼?忽略它們?將它們轉換爲最接近的ASCII等價物?轉換爲固定字符,例如問號? –

+0

對Python 2和Python 3來說,答案可能會有所不同。無論如何,你並沒有正確打開csv文件。建議您閱讀顯示如何正確顯示的文檔(在兩個版本中)。 – martineau

回答

1
  1. 使用Python 3,因爲Python的2 csv模塊沒有做編碼很好。
  2. 使用openencodingnewline選項。
  3. 刪除str轉換(在Python 3 str是Unicode字符串已經

結果:。

with open(file,"w",encoding='utf8',newline='') as csvfile: 
    fieldnames=['Username','Tweet','Timezone','Timestamp','Location'] 
    writer = csv.DictWriter(csvfile, fieldnames=fieldnames) 
    writer.writeheader() 
    for tweet in iterator: 
     username = tweet['user']['screen_name'] 
     tweet_text = tweet['text'] 
     user_timezone = tweet['user']['time_zone'] 
     tweet_timestamp = tweet['created_at'] 
     user_location = tweet['user']['location'] 
      . 
      . 
      . 

如果使用Python 2中,得到第三方unicodecsv模塊克服csv缺點

0

如果你真的想改變你的所有Unicode數據

tweet['text'].encode("ascii", "replace") 
or 
tweet['text'].encode("ascii", "ignore") # if you want skip char