2016-11-13 26 views
2

我有一個有兩列一個用於分享Tweet,另一個用於情感值格式化.CSV文件像這樣(但幾千年的鳴叫):將CSV文件的行轉換爲元組列表?

I like stackoverflow,Positive 
Thanks for your answers,Positive 
I hate sugar,Negative 
I do not like that movie,Negative 
stackoverflow is a question and answer site,Neutral 
Python is oop high-level programming language,Neutral 

我想獲得的輸出是這樣的:

negfeats = [('I do not like that movie','Negative'),('I hate sugar','Negative')] 
posfeats = [('I like stackoverflow','Positive'),('Thanks for your answers','Positive')] 
neufeats = [('stackoverflow is a question and answer site','Neutral'),('Python is oop high-level programming language','Neutral')] 

我已經嘗試過下面這樣做,但我在元組中找到了一些缺失的字符。另外,如何將x,y和z保留爲整數而不是浮點數?

import csv 
neg = ['Negative'] 
pos = ['Positive'] 
neu = ['Neutral'] 
neg_counter=0 
pos_counter=0 
neu_counter=0 
negfeats = [] 
posfeats = [] 
neufeats = [] 
with open('ff_tweets.csv', 'Ur') as f: 
    for k in f: 
     if any(word in k for word in neg): 
      negfeats = list(tuple(rec) for rec in csv.reader(f, delimiter=',')) 
      neg_counter+=1 
     elif any(word in k for word in pos): 
      posfeats = list(tuple(rec) for rec in csv.reader(f, delimiter=',')) 
      pos_counter+=1 
     else: 
      neufeats = list(tuple(rec) for rec in csv.reader(f, delimiter=',')) 
      neu_counter+=1 
x = neg_counter * 3/4 
y = pos_counter * 3/4 
z = neu_counte * 3/4 
print negfeats 
print posfeats 
print neufeats 
print x 
print y 
print z 

回答

0

這應該工作

import csv 

neg = 'Negative' 
pos = 'Positive' 
neu = 'Neutral' 
negfeats = [] 
posfeats = [] 
neufeats = [] 

with open('ff_tweets.csv', 'Ur') as f: 
    for r in csv.reader(f): 
     if r[1] == neg: 
      negfeats.append((r[0], r[1])) 
     if r[1] == pos: 
      posfeats.append((r[0], r[1])) 
     if r[1] == neu: 
      neufeats.append((r[0], r[1])) 

x = len(negfeats) * float(3)/4 
y = len(posfeats) * float(3)/4 
z = len(neufeats) * float(3)/4 

print negfeats 
print posfeats 
print neufeats 
print x 
print y 
print z 
+1

你應該要小心 - 在Python 2.7(如題)'3/4'回報'0'。 – will

+0

@ will tr​​ue,讓我們解決這個問題。 –

+0

@ÉbeIsaac這個解決方案理論上應該可以工作,但是當我運行它時,我得到了空列表,用於negfeats,posfeats和neufeats – Alsphere

0

試試這個,用大熊貓。 '情緒' 是CSV文件中的列:

import pandas as pd 

df = pd.read_csv('ff_tweets.csv') 

pos = tuple(df.loc[df['Sentiment'] == 'Positive'].apply(tuple, axis = 1)) 
neu = tuple(df.loc[df['Sentiment'] == 'Neutral'].apply(tuple, axis = 1)) 
neg = tuple(df.loc[df['Sentiment'] == 'Negative'].apply(tuple, axis = 1)) 

print pos, neg, neu 

輸出:

(('I like stackoverflow', 'Positive'), ('Thanks for your answers', 'Positive')) (('I hate sugar', 'Negative'), ('I do not like that movie', 'Negative')) (('stackoverflow is a question and answer site', 'Neutral'), ('Python is oop high-level programming language', 'Neutral')) 
+0

你能解釋一下pos線代碼的作用嗎? – Alsphere

+0

'pos'=正數,'neg'=負數,'neu'=中性 –

+0

每個都是一個單獨的元組 –