寫作阿拉伯語在使用Python 2.7

CSV文件考慮：寫作阿拉伯語在使用Python 2.7

# -*- coding: utf-8 -*- 
from __future__ import unicode_literals 
import json 

import unicodecsv as csv 
import pandas as pd 
tweets_data = [] 
tweets_file = open('tweets.txt', "r") 

for line in tweets_file: 
    try: 
     tweet = json.loads(line) 

     tweets_data.append(tweet) 
    except: 
     continue 
tweets_file1 = open('tweets.csv', "wb") 
tweets_file_writer = csv.writer(tweets_file1, encoding='utf-8') 
tweets_file_writer.writerow(['location', 'time', 'user_id', 'text', 'hashtags', 'user_mentions']) 
for i in tweets_data: 
    location = unicode(i[u'user'][u'location']).encode('utf-8') 
    time = unicode(i[u'created_at']).encode('utf-8') 
    user_id = unicode(i[u'user'][u'id']).encode('utf-8') 
    text = unicode(i[u'text']).encode('utf-8') 
    hashtag = i[u'entities'][u'hashtags'] 
    hashtags = [] 
    for j in hashtag: 
     print j[u'text'] 
     hashtags.append(u''.join(j[u'text']).encode('utf-8')) 


    mention = i[u'entities'][u'user_mentions'] 
    mentions = [] 
    for j in mention: 
     mentions.append(unicode(j[u'screen_name']).encode('utf-8')) 

    tweets_file_writer.writerow([location, time, user_id, text, hashtags, mentions]) 
tweets_file1.close()

我寫了這個代碼使用tweepy刮一些阿拉伯語數據。

我的問題是在這條線 tweets_file_writer.writerow([location, time, user_id, text, hashtags, mentions])時添加＃標籤列出它不會出現在阿拉伯語，雖然其他通常出現的所有數據。

實施例：

在CSV文件我需要編寫一個主題標籤列表等：

[ 'مجلة_النجوم2'， 'سهيله_بن_لشهب'， 'souhilabenlachhab']

似乎像這樣：

['\ xd9 \ X85 \ XD8 \ XAC \ xd9 \ X84 \ XD8 \ xa9_ \ XD8 \ XA7 \ xd9 \ X84 \ xd9 \ X8 6 \ xd8 \ xac \ xd9 \ x88 \ xd9 \ x852'， '\ xd8 \ xb3 \ xd9 \ x87 \ xd9 \ x8a \ xd9 \ x84 \ xd9 \ x87_ \ xd8 \ xa8 \ xd9 \ x86_ \ xd9 \ x84 \ XD8 \ XB4 \ xd9 \的x87 \ XD8 \ xa8' ， 'souhilabenlachhab']

來源

2017-04-17 Farag Mohammad

另外，python 3的unicode支持比2.x更強大。除非你絕對堅持，否則不要寫2.x。 – tdelaney

不幸的是它是一項作業任務 –

太糟糕了。你會認爲老師會跟上這種類型的事情。這是一個字符串中的utf-8數據，所以問題在於它如何到達那裏。我相信它實際上是在閱讀json文檔。嘗試'tweets_file = codecs.open（'tweets.txt'，「r」，encoding =「utf-8」）'。 ...並讓我知道如果這有效。 – tdelaney

您需要打開你打算寫爲UTF-8編碼的文件的文件試圖寫阿拉伯語之前它，所以：

tweets_file1 = open("tweets.csv", "wb")

應該是：

import codecs 
tweets_file1 = codecs.open("tweets.csv", "wb", "utf-8")

而且，正如其他人所提到的，一旦你不以P卡正在ython2，使用Python3使得使用阿拉伯語所以容易得多！

來源

2017-04-24 01:32:06 larapsodia

寫作阿拉伯語在使用Python 2.7

回答

相關問題