在python熊貓數據框中從前面的行中減去列的行

我有一個.dat文件，它需要一列中的數千行（比如說，列是time，t），現在我想查找間隔在列中的行之間，這意味着從第一行減去第二行的值，等等。（找到dt）。然後我想用這些間隔值創建一個新列，並將其與原始列進行比較。如果python以外的任何其他語言在這種情況下都有幫助，我也會讚賞他們的建議。
我寫了一個僞Python代碼爲：在python熊貓數據框中從前面的行中減去列的行

import pandas as pd 
import numpy as np 
from sys import argv 
from pylab import * 


import csv 



script, filename = argv 


# read flash.dat to a list of lists 
datContent = [i.strip().split() for i in open("./flash.dat").readlines()] 

# write it as a new CSV file 
with open("./flash.dat", "wb") as f: 
    writer = csv.writer(f) 
    writer.writerows(datContent) 


columns_to_keep = ['#time'] 
dataframe = pd.read_csv("./flash.csv", usecols=columns_to_keep) 


df = pd.DataFrame({"#time"}) 
df["#time"] = df["#time"] + [pd.Timedelta(minutes=m) for m in np.random.choice(a=range(60), size=df.shape[0])] 
df["value"] = np.random.normal(size=df.shape[0]) 

df["prev_time"] = [np.nan] + df.iloc[:-1]["#time"].tolist() 
df["time_delta"] = df.time - df.prev_time 
df 

pd.set_option('display.height', 1000) 
pd.set_option('display.max_rows', 1000) 
pd.set_option('display.max_columns', 500) 
pd.set_option('display.width', 1000) 

dataframe.plot(x='#time', y='time_delta', style='r') 

print dataframe 

show()

更新我的代碼，我也是共享的.dat文件我的工作。執行涉及來自不同行的值的操作 https://www.dropbox.com/s/w4jbxmln9e83355/flash.dat?dl=0

來源

2016-09-24 bhjghjh

大熊貓的轉換功能應該有所斬獲。 –

一個簡單的方法是簡單地複製所需的值一個在同一行，然後應用一個簡單的行方式運行。

例如，在你的榜樣，我們就會有一個time列中的數據框和其他一些數據，比如：

import pandas as pd 
import numpy as np 

df = pd.DataFrame({"time": pd.date_range("24 sept 2016", periods=5*24, freq="1h")}) 
df["time"] = df["time"] + [pd.Timedelta(minutes=m) for m in np.random.choice(a=range(60), size=df.shape[0])] 
df["value"] = np.random.normal(size=df.shape[0])

如果要計算從時間差在一個（或下，或其他）行，你可以簡單地將值從它複製，然後執行減法：

df["prev_time"] = [np.nan] + df.iloc[:-1]["time"].tolist() 
df["time_delta"] = df.time - df.prev_time 
df

來源

2016-09-24 07:41:27 Svend

我用你的建議更新了我的代碼，但是我有一些錯誤，因爲我的文件沒有按分鐘等數據保存數據。我分享了我正在處理的原始數據文件，請你花些時間看看它並更新你的代碼指定給我的文件。 – bhjghjh

嗨。我看看你的文件，據我猜測它已經包含時間增量而不是日期，可能表示爲毫秒或納秒的數量？同樣的邏輯也適用，一旦你在數據框中加載了文件，比如在變量'df'中，你可以用'df [「time」] = df.time.apply（lambda ms： pd.Timedelta（毫秒= ms））'（使毫秒適應列的含義）。之後，我發佈的代碼應該像下面這樣工作：時間戳的差異或時間差的差異都會產生時間差異。 – Svend

感謝問題解決 – bhjghjh

在python熊貓數據框中從前面的行中減去列的行

回答

相關問題