2016-12-13 62 views
3

我有一個簡單的.csv格式的數據,需要先處理,然後才能創建符合此數據的圖。但是,我知道如何從python操作.csv格式的數據。我想在R中應用相同的邏輯,但我不知道如何做到這一點。在R中的.csv數據操作而不是python

下面是來自.csv文件但加載到R的示例數據。我爲我們創建了代碼來討論此問題。

df <- data.frame(Name = c("AC", "AC", "PT", "PT", "OR", "OR"), 
    useless_column = c("","","A",3,4," "), 
    measurement = c("H", "", "K", "M", "", "H"), 
    amount = c(12, 54, 20, 87, 75, 22), 
    useless_column = c("","","A",3,4," ")) 

在Python中,我將通常這樣做:

import csv 
import os 
import glob 
import sys 
fileList = glob.glob("R:xxxxxxxxxxxxxxxxxxxxx\*.csv") 
for inputFile in fileList: 
     outputFilename = inputFile + "output.csv" 
     csvInput = csv.reader(open(inputFile,'r'),delimiter=",") 
     outputFile = open(outputFilename,'w') 
     outputFile.write("Name,measurement,amount\n") 
     csvInput.next() 
     for line in csvInput: 
      if line[2] == "H": 
       meas = "100" 
      elif line[2] == "K": 
       meas = "1000" 
      elif line[2] == "M": 
       meas = "1000000" 
      else: 
       meas = "1" 
      amount = int(meas) * line[3] 

      outputFile.write(",".join(line[0],line[2],amount+"\n"])) 
outputFile.close() 

在Python中,我可以加載CSV,然後使用for循環來識別從csv文件的每一行的。然後在繼續我的分析之前定製我的輸出文件。從上面,我希望我的輸出像下面的代碼是在R格式:

df <- data.frame(Name = c("AC", "AC", "PT", "PT", "OR", "OR"), 
    measurment = c("H", "", "K", "M", "", "H"), 
    amount = c(1200, 54, 20000, 87000000, 75, 2200)) 

我想知道這樣做在R?我有R和普萊舍的人的一個小代碼可以指引我到正確的方向:R中

x <- read.csv("xxxx.csv", header=T,sep=",") 
xC = ncol(x) 
xR = nrow(x) 
op = data.frame(matrix(data = x, nrow= xR, ncol=3,byrow=T)) 
for (x in :xC) 
{ 
    for (r in 1:xR) 
    { 
    xxxxxxxx 

    } 

回答

6

適應Python代碼意味着贊同量化操作的放棄循環。在這裏,我們可以創建基於一個名爲向量MEAS,然後計算量:

# dictionnary of measurement values: 
m <- c(H = 100, K = 1000, M = 1000000) 

# create meas based on measurement 
df$meas <- m[df$measurment] 
df$meas[is.na(df$meas)] <- 1 
# compute amount 
df$amount <- df$meas * df$amount 

數據

df <- data.frame(Name = c("AC", "AC", "PT", "PT", "OR", "OR"), 
       measurment = c("H", "", "K", "M", "", "H"), 
       amount = c(1200, 54, 20000, 87000000, 75, 2200)) 
0

您是否嘗試過使用pandas.read_csv?或者csv文件很不規則,你不能使用熊貓的'read_csv方法來讀取它們?

您可以執行for循環來處理來自每個文件的數據,然後將其附加到主文件夾DataFrame

例子:

import pandas as pd 

PATH = '/home/data/' # Example path 

master_df = pd.DataFrame() 
for inputFile in fileList: 
    csv_file = pd.read_csv(path + inputFile, sep=',') 
    H_index = csv_file[csv_file.loc[:, 2] == 'H'].index 
    csv_file.loc[H_index, 3] = csv_file.loc[H_index, 3] * 100 
    master_df = master_df.append(csv_file) 

我已經跳過了操作的KM一部分。

你可以直接從master_df通過執行類似

master_df.plot() 
0

繪製你已經得到了代碼在數據讀取(read.csv),所以我是正確的思維你的主要鬥爭是在自我欺騙?

如果是這樣,你可能繼續使用批次如果和爲循環,但我認爲有更簡單的方法。例如:

df <- read.csv("xxxx.csv", header=T,sep=",") 
df$meas <- df$measurement # Create a new column called 'meas' by copying column 'measurement' 
df$meas[df$meas == "H"] <- 100 # Replace H's with 100 
df$meas[df$meas == "K"] <- 1000 
df$meas[df$meas == "M"] <- 1000000 
df$value <- df$meas * df$amount