2015-04-04 22 views
3

我正在嘗試創建一個函數,它採用YYYY/MM/DD格式的兩個日期,讀取數據並返回包含兩個日期之間地震的緯度,經度,大小和深度的列表列表。該數據的格式如下:如何分割逗號分隔的數據並從python中的數據創建一個列表?

Date,TimeUTC,Latitude,Longitude,Magnitude,Depth 
2012/02/23,08:09:13.0,-20.984,-178.654,4.6,526 

這是我的嘗試:

from tempBetweenDates import dateLessThan 
import urllib.request 

def betweenDates(date1, date2, date3): 
    """Determines if the first date is on the second or between the second and third date.""" 
    date_1 = date1.split('/') 
    date_2 = date2.split('/') 
    date_3 = date3.split('/') 
    if int(date_1[0]) >= int(date_2[0]) and int(date_1[1]) >= int(date_2[1]) and int(date_1[2]) >= int(date_2[2]) and dateLessThan(int(date_1[1]), int(date_1[2]), int(date_1[0]), int(date_3[1]), int(date_3[2]), int(date_3[0])) == True: 
    return True 
else: 
    return False 

def parseEarthquakeData(date1, date2): 
    page = urllib.request.urlopen("http://www.choongsoo.info/teach/mcs177-sp12/projects/earthquake/earthquakeData-02-23-2012.txt") 
    eqdata = page.readlines() 
    dataList = [] 
    for line in eqdata: 
     lineSplit = line.split(',') 
     date = lineSplit[0] 
     data = lineSplit[2:6] 
     dataList = [[data] for line in eqdata if betweenDates(date, date1, date2) == True] 
    return(dataList) 

每當我嘗試和運行代碼我得到一個錯誤:

Traceback (most recent call last): 
    File "<pyshell#2>", line 1, in <module> 
    parseEarthquakeData("2012/02/22", "2012/02/19") 
    File "C:\Users\lcooper2\Desktop\Python\PROJECTS\plotEarthquakes.py", line 20, in parseEarthquakeData 
    lineSplit = line.split(',') 
TypeError: Type str doesn't support the buffer API 

如何任何提示避免這個錯誤?

+0

聖CRUD做你需要去發現'datetime'模塊! :) – 2015-04-04 03:12:48

+0

如果(betweenDates(date,date1,date2)) – CY5 2015-04-04 03:24:13

回答

0

在python 3.X中,urllib.response.readlines返回一個字節字符串,python 3被認爲是更安全的類型,並且友好的編碼不支持方法中所有不同的編碼字符串。

所以你的split方法實際上是在一個字節串上調用的,它需要一個字節而不是一個字符串。

因此,無論您將數據轉換回字符串

lineSplit = str(lineSplit) 

或傳遞一個字節的字符串分隔符

lineSplit = line.split(b',')

1

實際上,你可以做些什麼樣的同此涼!如果您通過csv.DictReader通過urllib.request.urlopen撥打電話回覆您的回覆,則可以消除大量的分組和分配。

import csv 
import datetime 
import urllib.request 

page = urllib.request.urlopen("http://www.choongsoo.info/teach/mcs177-sp12/projects/earthquake/earthquakeData-02-23-2012.txt") 
reader = csv.DictReader((line.decode() for line in page), delimiter=',') 

for line in reader: 
    # each line looks like: 
    # {'Longitude': '-178.654', 'Date': '2012/02/23', 
    # 'Depth': '526', 'Magnitude': '4.6', 'Latitude': '-20.984', 
    # 'TimeUTC': '08:09:13.0'} 
    # so you can use it like a dictionary! 
    date = datetime.datetime.strptime(line['Date'], "%Y/%m/%d") 
    # datetime objects like this aren't naive like numbers, so you can do: 
    # datetime.datetime(year=2012, month=2, day=23) < datetime.datetime(year=2012, month=2, day=24) 
    # and expect it to return True every time. This will massively simplify your 
    # betweenDates function. 

在追蹤錯誤的原因是urllib.request.urlopen給你一個HTTPResponse對象。這是一個迭代器,它爲您提供bytes對象,而不是string對象。調用bytes.decode()會將它們變成字符串,所以你可以像分裂它們一樣對它們做一些粘性的事情。

如果更改爲使用這些datetime對象,你betweenDates函數變爲:

def between_dates(date1, date2, date3): 
    return date2 <= date1 < date3 
+1

避免使用if(betweenDates(date,date1,date2)== True)而不是使用 我的答案有問題嗎?如果這是downvote的原因,我很想糾正它。 – 2015-04-04 03:46:24

+0

+1正義爲 – wim 2015-04-04 05:15:01

+0

NMDV,但我不確定逐行解碼。從概念上講,我不確定線路是否被定義,直到文件被解碼爲止 - 分離工作比其他任何事情都更加巧合。爲什麼不簡單地解碼然後迭代splitline的結果? – DSM 2015-04-04 15:17:12

相關問題