我試圖從輸入:1中取一個值作爲閾值c_med
的值,並在輸入的兩個不同輸出中分開上述和下面的值:2。寫above.csv
& below.csv
參考列c_total
。由閾值分開
閱讀above.csv
作爲輸入,並用純Python中寫入的點2中提及的百分比對它們進行分類。
輸入:1
date_count,all_hours,c_min,c_max,c_med,c_med_med,u_min,u_max,u_med,u_med_med
2,12,2309,19072,12515,13131,254,785,686,751
輸入:2 ['date','startTime','endTime','day','c_total','u_total']
2004-01-05,22:00:00,23:00:00,Mon,18944,790
2004-01-05,23:00:00,00:00:00,Mon,17534,750
2004-01-06,00:00:00,01:00:00,Tue,17262,747
2004-01-06,01:00:00,02:00:00,Tue,19072,777
2004-01-06,02:00:00,03:00:00,Tue,18275,785
2004-01-06,03:00:00,04:00:00,Tue,13589,757
2004-01-06,04:00:00,05:00:00,Tue,16053,735
2004-01-06,05:00:00,06:00:00,Tue,11440,636
2004-01-06,06:00:00,07:00:00,Tue,5972,513
2004-01-06,07:00:00,08:00:00,Tue,3424,382
2004-01-06,08:00:00,09:00:00,Tue,2696,303
2004-01-06,09:00:00,10:00:00,Tue,2350,262
2004-01-06,10:00:00,11:00:00,Tue,2309,254
- 我試圖從另一個輸入CSV
c_med
我得到以下讀取閾值錯誤:
Traceback (most recent call last):
File "class_med.py", line 10, in <module>
above_median = df_data['c_total'] > df_med['c_med']
File "/usr/local/lib/python2.7/dist-packages/pandas/core/ops.py", line 735, in wrapper
raise ValueError('Series lengths must match to compare')
ValueError: Series lengths must match to compare
濾波器百分比分離的數據列
c_total
。下面給出的純python解決方案,但我正在尋找熊貓解決方案。像在Reference onefor row in csv.reader(inp): if int(row[1])<(.20 * max_value): val = 'viewers' elif int(row[1])>=(0.20*max_value) and int(row[1])<(0.40*max_value): val= 'event based'
elif int(row[1])>=(0.40*max_value) and int(row[1])<(0.60*max_value): val= 'situational' elif int(row[1])>=(0.60*max_value) and int(row[1])<(0.80*max_value): val = 'active' else: val= 'highly active' writer.writerow([row[0],row[1],val])
代碼:
import pandas as pd
import numpy as np
df_med = pd.read_csv('stat_result.csv')
df_med.columns = ['date_count', 'all_hours', 'c_min', 'c_max', 'c_med', 'c_med_med', 'u_min', 'u_max', 'u_med', 'u_med_med']
df_data = pd.read_csv('mini_out.csv')
df_data.columns = ['date', 'startTime', 'endTime', 'day', 'c_total', 'u_total']
above = df_data['c_total'] > df_med['c_med']
#print above_median
above.to_csv('above.csv', index=None, header=None)
df_above = pd.readcsv('above_median.csv')
df_above.columns = ['date', 'startTime', 'endTime', 'day', 'c_total', 'u_total']
#Percentage block should come here
編輯:在單獨列的值的情況下,qcut
是最簡單的解決方案。但是當涉及到使用來自兩個不同列的兩個值時,如何在熊貓中實現這一點?
for row in csv.reader(inp):
if int(row[1])>(0.80*max_user) and int(row[2])>(0.80*max_key):
val='highly active'
elif int(row[1])>=(0.60*max_user) and int(row[2])<=(0.60*max_key):
val='active'
elif int(row[1])<=(0.40*max_user) and int(row[2])>=(0.40*max_key):
val='event based'
elif int(row[1])<(0.20*max_user) and int(row[2])<(0.20*max_key):
val ='situational'
else:
val= 'viewers'
太謝謝你了。讓我把整個代碼放在一起,並嘗試 –
謝謝你的最佳和簡單的解決方案。我是否也可以要求'qcut'有兩個值?請檢查問題的編輯部分。再次感謝 ! –
錯誤:「追溯(最近一次調用最後一次): 文件」class_med.py「,第13行,在 df2 ['cat'] = pd.qcut(df2.c_total,q = [0,.4,。 7,1。],標籤= ['未填充','相當填充','填充','高度填充']) 文件「/usr/local/lib/python2.7/dist-packages/pandas/tools/tile.py 「,第173行,在qcut precision = precision,include_lowest = True) 文件」/usr/local/lib/python2.7/dist-packages/pandas/tools/tile.py「,第217行,在_bins_to_cuts中 raise ValueError('Bin標籤必須小於' ValueError:Bin標籤必須比bin邊緣的數量少一個 –