2017-10-09 40 views
1

嗨,我有一個看起來像這樣的數據train.dat。我試圖創建一個變量,它將包含包含(-1或1)的列的[ith]值,以及另一個變量來保存包含字符串的列的值。如何在熊貓中將數據分成不同的變量

到目前爲止,我已經試過了,

df=pd.read_csv("train.dat",delimiter="\t", sep=',') 
# print(df.head()) 


# separate names from classes 
vals = df.ix[:,:].values 
names = [n[0][3:] for n in vals] 
cls = [n[0][0:] for n in vals] 
print(cls) 

但是輸出看起來都混亂了,任何幫助,將不勝感激。我在python

+0

請將您的數據樣本作爲文本發佈,而不是圖片。 –

回答

1

一個begineer如果數值之後的字符是一個標籤,你沒事,所有你需要的

import io # using io.StringIO for demonstration 
import pandas as pd 

ratings = "-1\tThis movie really sucks.\n-1\tRun colored water through 
a reflux condenser and call it a science movie?\n+1\tJust another zombie flick? You'll be surprised!" 

df = pd.read_csv(io.StringIO(ratings), sep='\t', 
       header=None, names=['change', 'rating']) 
  • 傳遞header=None可以確保第一行是解釋爲數據。
  • 傳遞names=['change', 'rating']提供了一些(合理的)列標題。

當然,該字符不是一個選項卡:D。

import io # using io.string 
import pandas as pd 

ratings = "-1 This movie really sucks.\n-1 Run colored water through a 
reflux condenser and call it a science movie?\n+1 Just another zombie 
flick? You'll be surprised!" 

df = pd.read_csv(io.StringIO(ratings), sep='\t', 
       header=None, names=['stuff']) 

df['change'], df['rating'] = df.stuff.str[:3], df.stuff.str[3:] 
df.drop('stuff', axis=1) 

一個可行的選擇是將整個評分讀作一個臨時列,拆分字符串,將其分配到兩列並最終刪除臨時列。