2016-04-19 56 views
0

我一直在根據這個答案編寫代碼(Reading csv to array, performing linear regression on array and writing to csv in Python depending on gradient),以便了解哪些日子在早上顯示出增加的風速。使用大熊貓執行迴歸,錯誤:無法連接'str'和'float'對象

這是我的數據

hd,Station Number,Year Month Day Hours Minutes in YYYY,MM,DD,HH24,MI format in Local time,Year Month Day Hours Minutes in YYYY,MM,DD,HH24,MI format in Local standard time,Year Month Day Hours Minutes in YYYY,MM,DD,HH24,MI format in Universal coordinated time,Precipitation since last (AWS) observation in mm,Quality of precipitation since last (AWS) observation value,Air Temperature in degrees Celsius,Quality of air temperature,Air temperature (1-minute maximum) in degrees Celsius,Quality of air temperature (1-minute maximum),Air temperature (1-minute minimum) in degrees Celsius,Quality of air temperature (1-minute minimum),Wet bulb temperature in degrees Celsius,Quality of Wet bulb temperature,Wet bulb temperature (1 minute maximum) in degrees Celsius,Quality of wet bulb temperature (1 minute maximum),Wet bulb temperature (1 minute minimum) in degrees Celsius,Quality of wet bulb temperature (1 minute minimum),Dew point temperature in degrees Celsius,Quality of dew point temperature,Dew point temperature (1-minute maximum) in degrees Celsius,Quality of Dew point Temperature (1-minute maximum),Dew point temperature (1 minute minimum) in degrees Celsius,Quality of Dew point Temperature (1 minute minimum),Relative humidity in percentage %,Quality of relative humidity,Relative humidity (1 minute maximum) in percentage %,Quality of relative humidity (1 minute maximum),Relative humidity (1 minute minimum) in percentage %,Quality of Relative humidity (1 minute minimum),Wind (1 minute) speed in km/h,Wind (1 minute) speed quality,Minimum wind speed (over 1 minute) in km/h,Minimum wind speed (over 1 minute) quality,Wind (1 minute) direction in degrees true,Wind (1 minute) direction quality,Standard deviation of wind (1 minute),Standard deviation of wind (1 minute) direction quality,Maximum wind gust (over 1 minute) in km/h,Maximum wind gust (over 1 minute) quality,Visibility (automatic - one minute data) in km,Quality of visibility (automatic - one minute data),Mean sea level pressure in hPa,Quality of mean sea level pressure,Station level pressure in hPa,Quality of station level pressure,QNH pressure in hPa,Quality of QNH pressure,# 
hd, 40842,2000,03,20,10,50,2000,03,20,10,50,2000,03,20,00,50,  ,N, 25.7,N, 25.7,N, 25.6,N, 21.5,N, 21.5,N, 21.4,N, 19.2,N, 19.2,N, 19.0,N, 67,N, 68,N, 66,N, 13,N, 9,N,100,N, 4,N, 15,N,  ,N,1018.6,N,1017.5,N,1018.6,N,# 
hd, 40842,2000,03,20,10,51,2000,03,20,10,51,2000,03,20,00,51, 0.0,N, 25.6,N, 25.8,N, 25.6,N, 21.5,N, 21.6,N, 21.5,N, 19.2,N, 19.4,N, 19.2,N, 68,N, 68,N, 66,N, 11,N, 9,N,107,N, 11,N, 13,N,  ,N,1018.6,N,1017.5,N,1018.6,N,# 
hd, 40842,2000,03,20,10,52,2000,03,20,10,52,2000,03,20,00,52, 0.0,N, 25.8,N, 25.8,N, 25.6,N, 21.7,N, 21.7,N, 21.5,N, 19.5,N, 19.5,N, 19.2,N, 68,N, 69,N, 66,N, 11,N, 9,N, 83,N, 13,N, 13,N,  ,N,1018.6,N,1017.5,N,1018.6,N,# 
hd, 40842,2000,03,20,10,53,2000,03,20,10,53,2000,03,20,00,53, 0.0,N, 25.8,N, 25.9,N, 25.8,N, 21.6,N, 21.8,N, 21.6,N, 19.3,N, 19.6,N, 19.3,N, 67,N, 68,N, 66,N, 9,N, 8,N, 87,N, 14,N, 11,N,  ,N,1018.6,N,1017.5,N,1018.6,N,# 
hd, 40842,2000,03,20,10,54,2000,03,20,10,54,2000,03,20,00,54, 0.0,N, 25.8,N, 25.8,N, 25.8,N, 21.6,N, 21.6,N, 21.6,N, 19.3,N, 19.3,N, 19.2,N, 67,N, 67,N, 67,N, 8,N, 4,N, 98,N, 23,N, 9,N,  ,N,1018.6,N,1017.5,N,1018.6,N,# 
hd, 40842,2000,03,20,10,55,2000,03,20,10,55,2000,03,20,00,55, 0.0,N, 25.7,N, 25.8,N, 25.7,N, 21.5,N, 21.6,N, 21.5,N, 19.2,N, 19.3,N, 19.2,N, 67,N, 68,N, 66,N, 8,N, 4,N, 68,N, 15,N, 9,N,  ,N,1018.6,N,1017.5,N,1018.6,N,# 
hd, 40842,2000,03,20,10,56,2000,03,20,10,56,2000,03,20,00,56, 0.0,N, 25.9,N, 25.9,N, 25.7,N, 21.7,N, 21.7,N, 21.5,N, 19.4,N, 19.4,N, 19.2,N, 67,N, 68,N, 66,N, 8,N, 5,N, 69,N, 16,N, 9,N,  ,N,1018.6,N,1017.5,N,1018.6,N,# 
hd, 40842,2000,03,20,10,57,2000,03,20,10,57,2000,03,20,00,57, 0.0,N, 26.0,N, 26.0,N, 25.9,N, 21.8,N, 21.8,N, 21.7,N, 19.5,N, 19.5,N, 19.4,N, 67,N, 68,N, 66,N, 9,N, 5,N, 72,N, 10,N, 11,N,  ,N,1018.6,N,1017.5,N,1018.6,N,# 
hd, 40842,2000,03,20,10,58,2000,03,20,10,58,2000,03,20,00,58, 0.0,N, 26.0,N, 26.1,N, 26.0,N, 21.7,N, 21.8,N, 21.7,N, 19.4,N, 19.5,N, 19.3,N, 66,N, 67,N, 66,N, 8,N, 5,N, 69,N, 13,N, 11,N,  ,N,1018.6,N,1017.5,N,1018.6,N,# 

的樣本,這是我嘗試代碼:

import glob 
import pandas as pd 
import numpy as np 
from datetime import datetime 

for file in glob.glob('X:/brisbaneweatherdata/*.txt'): 
    df = pd.read_csv(file) 

    col = 'Wind (1 minute) speed in km/h' 
    mask = pd.notnull(df[col]) 
    df = df.loc[mask] 

    for date, group in df.groupby(['Year Month Day Hours Minutes in YYYY', 'MM', 'DD']): 
     morning_data = group[group.HH24.between(9, 12)] 
     gradient, intercept = np.polyfit(morning_data.HH24, morning_data['Wind (1 minute) speed in km/h'], 1) 
     wind_direction = np.average(morning_data['Wind (1 minute) direction in degrees true']) 
     if gradient > 0: 
      print("{0:%d, %b %Y} , {1:.2f}, {2:.2f}".format(datetime(*date), gradient, wind_direction)) 

然而,這是生產

runfile('X:/python/linearregression.py', wdir='X:/python') 
X:/python/linearregression.py:1: DtypeWarning: Columns (17,25,27,29,31,33,35,37,55,57,59) have mixed types. Specify dtype option on import or set low_memory=False. 
    import glob 
Traceback (most recent call last): 

    File "<ipython-input-19-ace8af14da2c>", line 1, in <module> 
    runfile('X:/python/linearregression.py', wdir='X:/python') 

    File "C:\Users\kirkj\AppData\Local\Continuum\Anaconda2\lib\site-packages\spyderlib\widgets\externalshell\sitecustomize.py", line 699, in runfile 
    execfile(filename, namespace) 

    File "C:\Users\kirkj\AppData\Local\Continuum\Anaconda2\lib\site-packages\spyderlib\widgets\externalshell\sitecustomize.py", line 74, in execfile 
    exec(compile(scripttext, filename, 'exec'), glob, loc) 

    File "X:/python/linearregression.py", line 10, in <module> 
    gradient, intercept = np.polyfit(morning_data.HH24, morning_data['Wind (1 minute) speed in km/h'], 1) 

    File "C:\Users\kirkj\AppData\Local\Continuum\Anaconda2\lib\site-packages\numpy\lib\polynomial.py", line 550, in polyfit 
    y = NX.asarray(y) + 0.0 

TypeError: cannot concatenate 'str' and 'float' objects 

如果我嘗試轉換我一年的價值整數的浮游物,例如int('Year Month Day Hours Minutes in YYYY')int('MM')它會產生錯誤ValueError: invalid literal for int() with base 10: 'Year Month Day Hours Minutes in YYYY'

但是,在Unutbu的幫助下,TypeError問題已得到解決。這會產生下面的錯誤。

runfile('X:/python/linearregression.py', wdir='X:/python') 
X:/python/linearregression.py:1: DtypeWarning: Columns (17,25,27,29,31,33,35,37,55,57,59) have mixed types. Specify dtype option on import or set low_memory=False. 
    import glob 
C:\Users\kirkj\AppData\Local\Continuum\Anaconda2\lib\site-packages\numpy\lib\polynomial.py:594: RankWarning: Polyfit may be poorly conditioned 
    warnings.warn(msg, RankWarning) 
Traceback (most recent call last): 

    File "<ipython-input-24-ace8af14da2c>", line 1, in <module> 
    runfile('X:/python/linearregression.py', wdir='X:/python') 

    File "C:\Users\kirkj\AppData\Local\Continuum\Anaconda2\lib\site-packages\spyderlib\widgets\externalshell\sitecustomize.py", line 699, in runfile 
    execfile(filename, namespace) 

    File "C:\Users\kirkj\AppData\Local\Continuum\Anaconda2\lib\site-packages\spyderlib\widgets\externalshell\sitecustomize.py", line 74, in execfile 
    exec(compile(scripttext, filename, 'exec'), glob, loc) 

    File "X:/python/linearregression.py", line 17, in <module> 
    wind_direction = np.average(morning_data['Wind (1 minute) direction in degrees true']) 

    File "C:\Users\kirkj\AppData\Local\Continuum\Anaconda2\lib\site-packages\numpy\lib\function_base.py", line 570, in average 
    avg = a.mean(axis) 

    File "C:\Users\kirkj\AppData\Local\Continuum\Anaconda2\lib\site-packages\numpy\core\_methods.py", line 72, in _mean 
    ret = ret/rcount 

TypeError: unsupported operand type(s) for /: 'str' and 'int' 
+0

你能否包含完整的'TypeError'消息,包括顯示問題發生地點的回溯? – Marius

+0

我已經在問題中包含了完整的錯誤信息。 –

+0

我懷疑這與列標題被解析的方式有關。你從哪裏獲得數據?您可能想嘗試解析數據本身,而不使用列名。 'df = pd.read_csv(file,header = None)' – Alexander

回答

2

錯誤消息

File "C:\Users\kirkj\AppData\Local\Continuum\Anaconda2\lib\site-packages\numpy\lib\polynomial.py", line 550, in polyfit 
    y = NX.asarray(y) + 0.0 

TypeError: cannot concatenate 'str' and 'float' objects 

可如果y是包含字符串系列被複制:

In [14]: np.asarray(pd.Series(['',1.0])) + 0.0 
TypeError: cannot concatenate 'str' and 'float' objects 

現在,如果你peek at line 550 inside polynomial.py ,您會看到y是傳遞給np.polyfit的第二個參數。所以這強烈建議morning_data['Wind (1 minute) speed in km/h']是一個包含字符串的系列。

您發佈的示例數據不顯示字符串,但是在CSV的某處,您可能會在該列中找到一個字符串。

現在我們該如何找到那個字符串?一種方法是轉換的系列,以數值(強迫無效的字符串爲NaN):

col = 'Wind (1 minute) speed in km/h' 
tmp = pd.to_numeric(morning_data[col], errors='coerce') 

,然後尋找楠:

mask = pd.isnull(tmp) 
print(morning_data.loc[mask, col]) 

這將顯示在'Wind (1 minute) speed in km/h'所有的值不能轉換爲數字的列。

然後,您可以考慮如何處理這些有問題的行。如果 只是其中的幾個,您可以手動編輯它們。或者查看CSV 是如何生成的,並修復源代碼中的錯誤。或者,如果要放棄這些 行,你可以使用

for file in glob.glob('X:/brisbaneweatherdata/*.txt'): 
    df = pd.read_csv(file) 

    for col in ['Wind (1 minute) speed in km/h', 
       'Wind (1 minute) direction in degrees true']: 
     df[col] = pd.to_numeric(df[col], errors='coerce') 
     mask = pd.notnull(df[col]) 
     df = df.loc[mask] 

    for date, group in df.groupby(['Year Month Day Hours Minutes in YYYY', 'MM', 'DD']): 
     morning_data = group[group.HH24.between(9, 12)] 
     if len(morning_data) == 0: continue 
     gradient, intercept = np.polyfit(morning_data['HH24'], morning_data['Wind (1 minute) speed in km/h'], 1) 
     wind_direction = np.average(morning_data['Wind (1 minute) direction in degrees true']) 
     if gradient > 0: 
      print("{0:%d, %b %Y} , {1:.2f}, {2:.2f}".format(datetime(*date), gradient, wind_direction)) 

,然後將代碼的其餘部分應該有工作的機會。

+0

感謝您的回答。使用問題中的代碼和您的代碼(我已編輯該問題包含),我收到相同的TypeError消息,表明該polyfit仍在掙扎。 –

+1

@JossKirk:我忘了添加'df [col] = pd.to_numeric(df [col],errors ='coerce')' - 如果刷新網頁,您會看到它。沒有它,'df [col]'仍然包含數字字符串。因此'TypeError'。 – unutbu

+0

謝謝!這似乎解決了這個問題。我已經更新了該問題以包含正在發生的新TypeError。這一次風向是麻煩的。這也可以通過數據強制來解決嗎? –

1

我調整.between('9', '12').between(9, 12)np.average計算僅使用morning_data['Wind (1 minute) direction in degrees true'],並添加string格式到最後print聲明:

from datetime import datetime 
for date, group in df.groupby(['Year Month Day Hours Minutes in YYYY', 'MM', 'DD']): 
    morning_data = group[group.HH24.between(9, 12)] 
    gradient, intercept = np.polyfit(morning_data.HH24, morning_data['Wind (1 minute) speed in km/h'], 1) 
    wind_direction = np.average(morning_data['Wind (1 minute) direction in degrees true']) 
    if gradient > 0: 
     print("{0:%d, %b %Y} , {1:.2f}, {2:.2f}".format(datetime(*date), gradient, wind_direction)) 

這結束了工作的罰款(至少沒有錯誤)生產:

20, Mar 2000 , 0.47, 83.67 

這是DataFrame我複製你的樣品後獲得:

<class 'pandas.core.frame.DataFrame'> 
RangeIndex: 9 entries, 0 to 8 
Data columns (total 62 columns): 
hd                9 non-null object 
Station Number             9 non-null int64 
Year Month Day Hours Minutes in YYYY       9 non-null int64 
MM                9 non-null int64 
DD                9 non-null int64 
HH24               9 non-null int64 
MI format in Local time          9 non-null int64 
Year Month Day Hours Minutes in YYYY.1       9 non-null int64 
MM.1               9 non-null int64 
DD.1               9 non-null int64 
HH24.1               9 non-null int64 
MI format in Local standard time        9 non-null int64 
Year Month Day Hours Minutes in YYYY.2       9 non-null int64 
MM.2               9 non-null int64 
DD.2               9 non-null int64 
HH24.2               9 non-null int64 
MI format in Universal coordinated time      9 non-null int64 
Precipitation since last (AWS) observation in mm    9 non-null object 
Quality of precipitation since last (AWS) observation value 9 non-null object 
Air Temperature in degrees Celsius        9 non-null float64 
Quality of air temperature          9 non-null object 
Air temperature (1-minute maximum) in degrees Celsius   9 non-null float64 
Quality of air temperature (1-minute maximum)     9 non-null object 
Air temperature (1-minute minimum) in degrees Celsius   9 non-null float64 
Quality of air temperature (1-minute minimum)     9 non-null object 
Wet bulb temperature in degrees Celsius      9 non-null float64 
Quality of Wet bulb temperature        9 non-null object 
Wet bulb temperature (1 minute maximum) in degrees Celsius  9 non-null float64 
Quality of wet bulb temperature (1 minute maximum)    9 non-null object 
Wet bulb temperature (1 minute minimum) in degrees Celsius  9 non-null float64 
Quality of wet bulb temperature (1 minute minimum)    9 non-null object 
Dew point temperature in degrees Celsius      9 non-null float64 
Quality of dew point temperature        9 non-null object 
Dew point temperature (1-minute maximum) in degrees Celsius 9 non-null float64 
Quality of Dew point Temperature (1-minute maximum)   9 non-null object 
Dew point temperature (1 minute minimum) in degrees Celsius 9 non-null float64 
Quality of Dew point Temperature (1 minute minimum)   9 non-null object 
Relative humidity in percentage %        9 non-null int64 
Quality of relative humidity         9 non-null object 
Relative humidity (1 minute maximum) in percentage %   9 non-null int64 
Quality of relative humidity (1 minute maximum)    9 non-null object 
Relative humidity (1 minute minimum) in percentage %   9 non-null int64 
Quality of Relative humidity (1 minute minimum)    9 non-null object 
Wind (1 minute) speed in km/h         9 non-null int64 
Wind (1 minute) speed quality         9 non-null object 
Minimum wind speed (over 1 minute) in km/h      9 non-null int64 
Minimum wind speed (over 1 minute) quality      9 non-null object 
Wind (1 minute) direction in degrees true      9 non-null int64 
Wind (1 minute) direction quality        9 non-null object 
Standard deviation of wind (1 minute)       9 non-null int64 
Standard deviation of wind (1 minute) direction quality  9 non-null object 
Maximum wind gust (over 1 minute) in km/h      9 non-null int64 
Maximum wind gust (over 1 minute) quality      9 non-null object 
Visibility (automatic - one minute data) in km     9 non-null object 
Quality of visibility (automatic - one minute data)   9 non-null object 
Mean sea level pressure in hPa         9 non-null float64 
Quality of mean sea level pressure        9 non-null object 
Station level pressure in hPa         9 non-null float64 
Quality of station level pressure        9 non-null object 
QNH pressure in hPa           9 non-null float64 
Quality of QNH pressure          9 non-null object 
#                9 non-null object 
dtypes: float64(12), int64(24), object(26) 
memory usage: 4.4+ KB 
+0

我已經使用了你的代碼,這裏提供了,並且已經導入了日期時間。但它仍然會產生相同的類型錯誤。我已更新該問題以包含新代碼和新錯誤消息。如果我將風向欄浮起來,它也會產生一個類型錯誤。 –

+1

需要'從datetime導入日期時間',請參閱更新,還包括數據的外觀。 – Stefan

+0

感謝您的幫助,TypeError仍在發生,將不得不繼續努力。 –

相關問題