所以我在讀從NOAA站碼CSV文件看起來像這樣:熊貓read_csv D型前導零
"USAF","WBAN","STATION NAME","CTRY","FIPS","STATE","CALL","LAT","LON","ELEV(.1M)","BEGIN","END"
"006852","99999","SENT","SW","SZ","","","+46817","+010350","+14200","",""
"007005","99999","CWOS 07005","","","","","-99999","-999999","-99999","20120127","20120127"
前兩列包含氣象站代碼,有時還前導零。當熊貓在沒有指定dtype的情況下導入它們時,它們會變成整數。這並不是什麼大事,因爲我可以循環訪問數據框索引並用"%06d" % i
之類的東西代替它們,因爲它們總是六位數字,但是您知道......這是懶惰的方式。
使用此代碼得到的CSV:
file = urllib.urlopen(r"ftp://ftp.ncdc.noaa.gov/pub/data/inventories/ISH-HISTORY.CSV")
output = open('Station Codes.csv','wb')
output.write(file.read())
output.close()
這是一個好主意,但是當我去嘗試,並使用該閱讀:
import pandas as pd
df = pd.io.parsers.read_csv("Station Codes.csv",dtype={'USAF': np.str, 'WBAN': np.str})
或
import pandas as pd
df = pd.io.parsers.read_csv("Station Codes.csv",dtype={'USAF': str, 'WBAN': str})
我收到一條令人討厭的錯誤消息:
File "C:\Python27\lib\site-packages\pandas-0.11.0-py2.7-win32.egg\pandas\io\parsers.py", line 401, in parser
_f
return _read(filepath_or_buffer, kwds)
File "C:\Python27\lib\site-packages\pandas-0.11.0-py2.7-win32.egg\pandas\io\parsers.py", line 216, in _read
return parser.read()
File "C:\Python27\lib\site-packages\pandas-0.11.0-py2.7-win32.egg\pandas\io\parsers.py", line 633, in read
ret = self._engine.read(nrows)
File "C:\Python27\lib\site-packages\pandas-0.11.0-py2.7-win32.egg\pandas\io\parsers.py", line 957, in read
data = self._reader.read(nrows)
File "parser.pyx", line 654, in pandas._parser.TextReader.read (pandas\src\parser.c:5931)
File "parser.pyx", line 676, in pandas._parser.TextReader._read_low_memory (pandas\src\parser.c:6148)
File "parser.pyx", line 752, in pandas._parser.TextReader._read_rows (pandas\src\parser.c:6962)
File "parser.pyx", line 837, in pandas._parser.TextReader._convert_column_data (pandas\src\parser.c:7898)
File "parser.pyx", line 887, in pandas._parser.TextReader._convert_tokens (pandas\src\parser.c:8483)
File "parser.pyx", line 953, in pandas._parser.TextReader._convert_with_dtype (pandas\src\parser.c:9535)
File "parser.pyx", line 1283, in pandas._parser._to_fw_string (pandas\src\parser.c:14616)
TypeError: data type not understood
這是一個非常大的csv(31k行),所以也許這與它有什麼關係?
我發現,使用對象的工作,以保持前導零:D型= {'USAF':object,'WBAN':object} from this post:http:// stackoverflow。com/questions/13293810/import-pandas-dataframe-column-as-string-not-int –
有點奇怪,str/np.str不能正常工作......:SI不知道它是否是一個bug,可能值得發佈[github上的問題](https://github.com/pydata/pandas/issues)。 –
是的,我認爲這很奇怪,因爲我可以在那裏使用其他數字數據類型。 –