我一直在根據這個答案編寫代碼(Reading csv to array, performing linear regression on array and writing to csv in Python depending on gradient),以便了解哪些日子在早上顯示出增加的風速。使用大熊貓執行迴歸,錯誤:無法連接'str'和'float'對象
這是我的數據
hd,Station Number,Year Month Day Hours Minutes in YYYY,MM,DD,HH24,MI format in Local time,Year Month Day Hours Minutes in YYYY,MM,DD,HH24,MI format in Local standard time,Year Month Day Hours Minutes in YYYY,MM,DD,HH24,MI format in Universal coordinated time,Precipitation since last (AWS) observation in mm,Quality of precipitation since last (AWS) observation value,Air Temperature in degrees Celsius,Quality of air temperature,Air temperature (1-minute maximum) in degrees Celsius,Quality of air temperature (1-minute maximum),Air temperature (1-minute minimum) in degrees Celsius,Quality of air temperature (1-minute minimum),Wet bulb temperature in degrees Celsius,Quality of Wet bulb temperature,Wet bulb temperature (1 minute maximum) in degrees Celsius,Quality of wet bulb temperature (1 minute maximum),Wet bulb temperature (1 minute minimum) in degrees Celsius,Quality of wet bulb temperature (1 minute minimum),Dew point temperature in degrees Celsius,Quality of dew point temperature,Dew point temperature (1-minute maximum) in degrees Celsius,Quality of Dew point Temperature (1-minute maximum),Dew point temperature (1 minute minimum) in degrees Celsius,Quality of Dew point Temperature (1 minute minimum),Relative humidity in percentage %,Quality of relative humidity,Relative humidity (1 minute maximum) in percentage %,Quality of relative humidity (1 minute maximum),Relative humidity (1 minute minimum) in percentage %,Quality of Relative humidity (1 minute minimum),Wind (1 minute) speed in km/h,Wind (1 minute) speed quality,Minimum wind speed (over 1 minute) in km/h,Minimum wind speed (over 1 minute) quality,Wind (1 minute) direction in degrees true,Wind (1 minute) direction quality,Standard deviation of wind (1 minute),Standard deviation of wind (1 minute) direction quality,Maximum wind gust (over 1 minute) in km/h,Maximum wind gust (over 1 minute) quality,Visibility (automatic - one minute data) in km,Quality of visibility (automatic - one minute data),Mean sea level pressure in hPa,Quality of mean sea level pressure,Station level pressure in hPa,Quality of station level pressure,QNH pressure in hPa,Quality of QNH pressure,#
hd, 40842,2000,03,20,10,50,2000,03,20,10,50,2000,03,20,00,50, ,N, 25.7,N, 25.7,N, 25.6,N, 21.5,N, 21.5,N, 21.4,N, 19.2,N, 19.2,N, 19.0,N, 67,N, 68,N, 66,N, 13,N, 9,N,100,N, 4,N, 15,N, ,N,1018.6,N,1017.5,N,1018.6,N,#
hd, 40842,2000,03,20,10,51,2000,03,20,10,51,2000,03,20,00,51, 0.0,N, 25.6,N, 25.8,N, 25.6,N, 21.5,N, 21.6,N, 21.5,N, 19.2,N, 19.4,N, 19.2,N, 68,N, 68,N, 66,N, 11,N, 9,N,107,N, 11,N, 13,N, ,N,1018.6,N,1017.5,N,1018.6,N,#
hd, 40842,2000,03,20,10,52,2000,03,20,10,52,2000,03,20,00,52, 0.0,N, 25.8,N, 25.8,N, 25.6,N, 21.7,N, 21.7,N, 21.5,N, 19.5,N, 19.5,N, 19.2,N, 68,N, 69,N, 66,N, 11,N, 9,N, 83,N, 13,N, 13,N, ,N,1018.6,N,1017.5,N,1018.6,N,#
hd, 40842,2000,03,20,10,53,2000,03,20,10,53,2000,03,20,00,53, 0.0,N, 25.8,N, 25.9,N, 25.8,N, 21.6,N, 21.8,N, 21.6,N, 19.3,N, 19.6,N, 19.3,N, 67,N, 68,N, 66,N, 9,N, 8,N, 87,N, 14,N, 11,N, ,N,1018.6,N,1017.5,N,1018.6,N,#
hd, 40842,2000,03,20,10,54,2000,03,20,10,54,2000,03,20,00,54, 0.0,N, 25.8,N, 25.8,N, 25.8,N, 21.6,N, 21.6,N, 21.6,N, 19.3,N, 19.3,N, 19.2,N, 67,N, 67,N, 67,N, 8,N, 4,N, 98,N, 23,N, 9,N, ,N,1018.6,N,1017.5,N,1018.6,N,#
hd, 40842,2000,03,20,10,55,2000,03,20,10,55,2000,03,20,00,55, 0.0,N, 25.7,N, 25.8,N, 25.7,N, 21.5,N, 21.6,N, 21.5,N, 19.2,N, 19.3,N, 19.2,N, 67,N, 68,N, 66,N, 8,N, 4,N, 68,N, 15,N, 9,N, ,N,1018.6,N,1017.5,N,1018.6,N,#
hd, 40842,2000,03,20,10,56,2000,03,20,10,56,2000,03,20,00,56, 0.0,N, 25.9,N, 25.9,N, 25.7,N, 21.7,N, 21.7,N, 21.5,N, 19.4,N, 19.4,N, 19.2,N, 67,N, 68,N, 66,N, 8,N, 5,N, 69,N, 16,N, 9,N, ,N,1018.6,N,1017.5,N,1018.6,N,#
hd, 40842,2000,03,20,10,57,2000,03,20,10,57,2000,03,20,00,57, 0.0,N, 26.0,N, 26.0,N, 25.9,N, 21.8,N, 21.8,N, 21.7,N, 19.5,N, 19.5,N, 19.4,N, 67,N, 68,N, 66,N, 9,N, 5,N, 72,N, 10,N, 11,N, ,N,1018.6,N,1017.5,N,1018.6,N,#
hd, 40842,2000,03,20,10,58,2000,03,20,10,58,2000,03,20,00,58, 0.0,N, 26.0,N, 26.1,N, 26.0,N, 21.7,N, 21.8,N, 21.7,N, 19.4,N, 19.5,N, 19.3,N, 66,N, 67,N, 66,N, 8,N, 5,N, 69,N, 13,N, 11,N, ,N,1018.6,N,1017.5,N,1018.6,N,#
的樣本,這是我嘗試代碼:
import glob
import pandas as pd
import numpy as np
from datetime import datetime
for file in glob.glob('X:/brisbaneweatherdata/*.txt'):
df = pd.read_csv(file)
col = 'Wind (1 minute) speed in km/h'
mask = pd.notnull(df[col])
df = df.loc[mask]
for date, group in df.groupby(['Year Month Day Hours Minutes in YYYY', 'MM', 'DD']):
morning_data = group[group.HH24.between(9, 12)]
gradient, intercept = np.polyfit(morning_data.HH24, morning_data['Wind (1 minute) speed in km/h'], 1)
wind_direction = np.average(morning_data['Wind (1 minute) direction in degrees true'])
if gradient > 0:
print("{0:%d, %b %Y} , {1:.2f}, {2:.2f}".format(datetime(*date), gradient, wind_direction))
然而,這是生產
runfile('X:/python/linearregression.py', wdir='X:/python')
X:/python/linearregression.py:1: DtypeWarning: Columns (17,25,27,29,31,33,35,37,55,57,59) have mixed types. Specify dtype option on import or set low_memory=False.
import glob
Traceback (most recent call last):
File "<ipython-input-19-ace8af14da2c>", line 1, in <module>
runfile('X:/python/linearregression.py', wdir='X:/python')
File "C:\Users\kirkj\AppData\Local\Continuum\Anaconda2\lib\site-packages\spyderlib\widgets\externalshell\sitecustomize.py", line 699, in runfile
execfile(filename, namespace)
File "C:\Users\kirkj\AppData\Local\Continuum\Anaconda2\lib\site-packages\spyderlib\widgets\externalshell\sitecustomize.py", line 74, in execfile
exec(compile(scripttext, filename, 'exec'), glob, loc)
File "X:/python/linearregression.py", line 10, in <module>
gradient, intercept = np.polyfit(morning_data.HH24, morning_data['Wind (1 minute) speed in km/h'], 1)
File "C:\Users\kirkj\AppData\Local\Continuum\Anaconda2\lib\site-packages\numpy\lib\polynomial.py", line 550, in polyfit
y = NX.asarray(y) + 0.0
TypeError: cannot concatenate 'str' and 'float' objects
如果我嘗試轉換我一年的價值整數的浮游物,例如int('Year Month Day Hours Minutes in YYYY')
或int('MM')
它會產生錯誤ValueError: invalid literal for int() with base 10: 'Year Month Day Hours Minutes in YYYY'
但是,在Unutbu的幫助下,TypeError問題已得到解決。這會產生下面的錯誤。
runfile('X:/python/linearregression.py', wdir='X:/python')
X:/python/linearregression.py:1: DtypeWarning: Columns (17,25,27,29,31,33,35,37,55,57,59) have mixed types. Specify dtype option on import or set low_memory=False.
import glob
C:\Users\kirkj\AppData\Local\Continuum\Anaconda2\lib\site-packages\numpy\lib\polynomial.py:594: RankWarning: Polyfit may be poorly conditioned
warnings.warn(msg, RankWarning)
Traceback (most recent call last):
File "<ipython-input-24-ace8af14da2c>", line 1, in <module>
runfile('X:/python/linearregression.py', wdir='X:/python')
File "C:\Users\kirkj\AppData\Local\Continuum\Anaconda2\lib\site-packages\spyderlib\widgets\externalshell\sitecustomize.py", line 699, in runfile
execfile(filename, namespace)
File "C:\Users\kirkj\AppData\Local\Continuum\Anaconda2\lib\site-packages\spyderlib\widgets\externalshell\sitecustomize.py", line 74, in execfile
exec(compile(scripttext, filename, 'exec'), glob, loc)
File "X:/python/linearregression.py", line 17, in <module>
wind_direction = np.average(morning_data['Wind (1 minute) direction in degrees true'])
File "C:\Users\kirkj\AppData\Local\Continuum\Anaconda2\lib\site-packages\numpy\lib\function_base.py", line 570, in average
avg = a.mean(axis)
File "C:\Users\kirkj\AppData\Local\Continuum\Anaconda2\lib\site-packages\numpy\core\_methods.py", line 72, in _mean
ret = ret/rcount
TypeError: unsupported operand type(s) for /: 'str' and 'int'
你能否包含完整的'TypeError'消息,包括顯示問題發生地點的回溯? – Marius
我已經在問題中包含了完整的錯誤信息。 –
我懷疑這與列標題被解析的方式有關。你從哪裏獲得數據?您可能想嘗試解析數據本身,而不使用列名。 'df = pd.read_csv(file,header = None)' – Alexander