數據集內尋找最大值給出限制

Station1.txt樣品#different套不同的沒有數據的任務。站

Date  Temperature 
19600101 46.1 
19600102 46.7 
19600103 99999.9 #99999 = not recorded 
19600104 43.3 
19600105 38.3 
19600106 40.0 
19600107 42.8

我試圖創建一個函數 display_maxs(stations, dates, data, start_date, end_date)其顯示最高溫度爲給定的站/ s和給定日期範圍的表。例如：

stations = load_stations('stations2.txt') 
5 
data = load_all_stations_data(stations) 
dates = load_dates(stations) 
display_maxs(stations, dates, data, '20021224','20021228' #these are date yyyy/mm/dd)

我創建功能數據

def load_all_stations_data(stations): 
data = {} 
file_list = ("Brisbane.txt", "Rockhampton.txt", "Cairns.txt",  "Melbourne.txt", "Birdsville.txt", "Charleville.txt")) 
for file_name in file_list: 
file = open(stations(), 'r') 

station = file_name.split()[0] 

data[station] = [] 
for line in file: 
    values = line.strip().strip(' ') 
    if len(values) == 2: 
     data[station] = values[1] 
file.close() 

return data

功能站

def load_all_stations_data(stations): 
stations = [] 
f = open(stations[0] + '.txt', 'r') 
stations = [] 
for line in f: 
    x = (line.split()[1]) 
    x = x.strip() 
    temp.append(x) 
f.close() 

return stations

和功能日期

def load_dates(stations): 

f = open(stations[0] + '.txt', 'r') 

dates = [] 
for line in f: 
    dates.append(line.split()[0]) 
f.close() 
return dates

現在我只需要幫助創造桌子whi ch顯示任何給定日期限制的最大溫度，並用數據，日期和工作站調用上述功能。

來源

2015-04-15 bberm3

實際上在文件中是否存在'＃99999 = not recorded'，或者您是否僅僅爲了我們的利益而添加了這些內容？ – TheBlackCat

不知道這些功能應該做什麼，特別是其中兩個似乎具有相同的名稱。你的代碼中還有很多錯誤。

file = open(stations(), 'r')在這裏，您嘗試調用stations作爲函數，但它似乎是一個列表。
station = file_name.split()[0]文件名沒有空格，所以這沒有效果。你的意思是split('.')？
values = line.strip().strip(' ')大概其中之一strip應該是split？
data[station] = values[1]在每次迭代中重寫。你可能想要append的值？
temp.append(x)變量temp未定義;你的意思是stations？

此外，而不是閱讀日期和值到兩個單獨的列表，我建議你創建一個元組列表。這樣，你只需要一個單一的功能：

def get_data(filename): 
    with open(filename) as f: 
     data = [] 
     for line in f: 
      try: 
       date, value = line.split() 
       data.append((int(date), float(value))) 
      except: 
       pass # pass on header, empty lines ,etc. 
     return data

如果這不是一個選項，您可以通過壓縮和解日期和價值觀，即data = zip(dates, values)的列表創建元組的列表。然後，您可以將內置函數與列表理解或生成器表達式一起使用，以便過濾日期和用於按值排序的特殊函數之間的值。

def display_maxs(data, start_date, end_date): 
    return max(((d, v) for (d, v) in data 
       if start_date <= d <= end_date and v < 99999), 
       key=lambda x: x[1]) 

print display_maxs(get_data("Station1.txt"), 19600103, 19600106)

來源

2015-04-15 14:46:52

使用pandas。讀取每個文本文件只是一個功能，包含註釋處理，缺少數據（99999.9）處理和日期處理。下面的代碼將從一系列文件名fnames中讀取文件，處理註釋並將9999.9轉換爲「缺失」值。然後它會得到從start到stop的日期以及站名稱序列（文件名稱減去擴展名），然後獲取每個站點的最大值（在maxdf中）。第二個功能

import pandas as pd 
import os 

def load_all_stations_data(stations): 
    """Load the stations defined in the sequence of file names.""" 
    sers = [] 
    for fname in stations: 
     ser = pd.read_csv(fname, sep='\s+', header=0, index_col=0, 
          comment='#', engine='python', parse_dates=True, 
          squeeze=True, na_values=['99999.9']) 
     ser.name = os.path.splitext(fname)[0] 
     sers.append(ser) 
    return pd.concat(sers, axis=1) 

def get_maxs(startdate, stopdate, stations): 
    """Get the stations and date range given, then get the max for each""" 
    return df.loc[start:stop, sites].max(skipna=True)

用法會像這樣：

maxdf = get_maxs(df, '20021224','20021228', ("Rockhampton", "Cairns"))

如果#99999 = not recorded評論實際上不是在你的文件，你可以擺脫engine='python'和comment='#'參數：

def load_all_stations_data(stations): 
    """Load the stations defined in the sequence of file names.""" 
    sers = [] 
    for fname in stations: 
     ser = pd.read_csv(fname, sep='\s+', header=0, index_col=0, 
          parse_dates=True, squeeze=True, 
          na_values=['99999.9']) 
     ser.name = os.path.splitext(fname)[0] 
     sers.append(ser) 
    return pd.concat(sers, axis=1)

來源

2015-04-15 14:52:05 TheBlackCat

數據集內尋找最大值給出限制

回答

相關問題