2015-08-09 133 views
1

如果在一個數據幀的熊貓列timestampMs值是unicode型的,我們希望將其轉換爲一個float,有以下2種方法之間有什麼區別?轉換大熊貓的數據類型列

df['timestampMs'].map(lambda x: float(x)/1000) 

df['timestampMs'].astype('float')/1000 

由於它們似乎都得到相同的結果,這是優選的方法?

+0

閱讀本https://github.com/pydata/pandas/blob/a7437430b5cb62e49a79b64d18eccfb2b4d6367f/pandas/core/internals.py#L375,並根據您的標準決定相應的「首選' 方法。例如,如果某個值爲「 - 」,則第一個示例將失敗。 – jonnybazookatone

回答

2

嗯......如果你關心速度,lambda方法對於小數據集來說要快一點點。對於大型數據集去爲.astype()方法(我還親自找到它更易讀):

import time 
import timeit 
import pandas as pd 

num_elements = 100 
times = [unicode(time.clock()) for x in range(num_elements)] 

df = pd.DataFrame(times) 

def first_method(): 
    df[0].map(lambda x: float(x)/1000) 

def second_method(): 
    df[0].astype('float')/1000 

num_reps = 15000 

print("First method time for {} reps: {}".format(num_reps, timeit.timeit(first_method, number=num_reps))) 
print("Second method time for {} reps: {}".format(num_reps, timeit.timeit(second_method, number=num_reps))) 

num_elements = 100我得到:

First method time for 15000 reps: 1.95685731342 
Second method time for 15000 reps: 2.22381265566 

num_elements = 1000我得到:

First method time for 15000 reps: 12.0774245498 
Second method time for 15000 reps: 6.77670391568