2013-09-29 56 views
3

csv的第一行包含標題。 這裏是我的CSV的樣本行:CSV閱讀器和DictReader將數字字段轉換爲字符串

2013-07-31 00:00:00,,1.0,2013.0,7.0,Q3,21160742,32HHBS1307170203,KL0602130731,AIRFRANCE 
KLM,KLM,KLM,KLM,KL,KLM ROYAL DUTCH AIRLINES,,0602,,KL0602,KL,KLM ROYAL DUTCH 
AIRLINES,,,,KL,0602,,,LAX,AMS,,31-7-2013 0:00:00,2013-07-31,2013-07-31,2013-07-31,2013-07-31, 
13:55:00,14:39:00,20:55:00,21:39:00,2013-08-01,2013-08-01,2013-08-01,2013-08-01, 
09:05:00,09:45:00,07:05:00,07:45:00,2.0,,2,,,LAX,LOS ANGELES INTERNATIONAL AIRPORT, 
LAX,LAX,5.0,LAX,LOS ANGELES,US,UNITED STATES OF AMERICA,US,USA,NA8,NORTHERN AMERICA, 
AMERICAS,,,,AMS,SCHIPHOL I,F,OFFLINE,I,INDIRECT OFFLINE,14.0,3.0,FRONT,Business,2.0,nan, 
PLANNED,3.0,,2.0,2.0,34.0,4.0,400254887nan,1.0,2.0,2.0,2.0,1.0,2.0,6.0,3.0,1.0,3.0,1.0,1.0, 
nan,nan,nan,nan,nan,nan,nan,3.0,3.0,3.0,nan,nan,nan,nan,nan,nan,nan,nan,nan,nan,nan,nan,nan,nan, 
nan,2.0,2.0,2.0,2.0,2.0,7.0,nan,2.0,3.0,3.0,3.0,3.0,nan,nan,nan,nan,nan,nan,nan,nan,nan, 
nan,nan,nan,nan,6.0,1.0,nan,nan,nan,nan,nan,2.0,nan,nan,nan,nan,nan,nan,nan,nan,nan,2.0,2.0, 
nan,2.0,nan,3.0,nan,,nan,nan,nan,nan,nan,nan,nan,nan,nan,nan,nan,nan,nan,nan,nan,13.7885862654653, 
0.2, 34273499844164,nan,37.0,Booked,35.0,10.0,2.0,2.0,6.0,35.0,10.0,42.0,nan,nan,LAX,LAX,N 

如果我請使用input_file = csv.DictReader(open("file.csv")input_file = csv.reader(open('file.csv')),我所有的對象將變成字符串。

一塊在python打印的行:

'2013-08-31 00:00:00', '', '1.0', '2013.0', '8.0', 'Q3','C', '03J', '', '', 
'', '', 'nan', 'nan', '', 'NON-AIRPORT', 'SELF-SERVICE', 'ICI', '', '19.0', '20130819', 
'1.0', '19.0', '9.0', '20130901', '2.0', '1.0', '1.0', '1.0', '10.0', '5.0', '5.0', '3.0', 
'4.0', '4.0', '2.0', '2.0', '', 'nan', '2.0', '', '24854524', 'nan', 'nan', 'nan', 'nan', 
'1.0', 'nan', '5.0', 'nan', 'nan', 'nan', 'nan', 'nan', 'nan', 'nan', 'nan', 'nan', 'nan', 
'nan', '4.0', 'nan', 'nan', 'nan', 'nan', 'nan', 'nan', 'nan', 'nan', 'nan', 'nan', 'nan', 
'nan', 'nan', 'nan', '2.0', 'nan', 'nan', 'nan', 'nan', 'nan', 'nan', 'nan', 'nan', 'nan', 
'nan', '3.0', '5.0', '5.0' 

正如你可以看到所有日期,字符串,浮點和整數已變成字符串。我如何正確導入它們?假設它有400列數據,我無法手動定義每列的類型。

回答

6

你再回頭看這個。這並不是說它們變成了字符串,而是它們字符串,因爲CSV不是保存類型信息的格式。你沒有做任何事情來將它們變成其他任何東西,Python也不會猜測。 Nan是一個漂浮物,還是一個親人的祖母的名字?是3.0浮動,還是前衛nerdcore藍調樂隊的名字?

如果你能想到的算法去猜測類型,那麼你可以申請,當然:

import csv 
import ast 
import datetime 

def guess_type(x): 
    attempt_fns = [ast.literal_eval, 
        float, 
        lambda x: datetime.datetime.strptime(x, 
                "%Y-%m-%d %H:%M:%S") 
        ] 
    for fn in attempt_fns: 
     try: 
      return fn(x) 
     except (ValueError, SyntaxError): 
      pass 
    return x 

with open("untyped.csv", "rb") as fp: 
    reader = csv.reader(fp) 
    for row in reader: 
     row = [guess_type(x) for x in row] 
     print row 
     print map(type, row) 

隨着文件

2013-07-31 00:00:00,,1.0,2013.0,7.0,Q3,21160742,32HHBS1307170203,nan 

上面的代碼會產生

[datetime.datetime(2013, 7, 31, 0, 0), '', 1.0, 2013.0, 7.0, 'Q3', 21160742, '32HHBS1307170203', nan] 
[<type 'datetime.datetime'>, <type 'str'>, <type 'float'>, <type 'float'>, <type 'float'>, <type 'str'>, <type 'int'>, <type 'str'>, <type 'float'>] 

這不壞。 PS:如果你要在Python中使用CSV文件進行認真的工作,我強烈建議檢查pandas - 否則你會浪費時間重新實現其部分功能。

+0

謝謝! +1的熊貓發行。 – Diolor

3

它們不會轉換爲字符串,它們已經是字符串開頭。但是你可以嘗試將它們轉換成浮動閱讀這些條款後:

假設row包含一行數據,那麼你可以做

newrow = [] 
for item in row: 
    try: 
     newrow.append(float(item)) 
    except ValueError: 
     newrow.append(item)