2017-06-01 22 views
2

我在當前是字符串的數據框中有一列。我需要將這些數據轉換爲浮點數並作爲數組提取,以便我可以使用座標對。試圖將一列字符串轉換爲浮點數

In [55]:apt_data['geotag'] 

Out[55]: 

0  (40.7763, -73.9529) 
1  (40.72785, -73.983307) 
2  (40.7339, -74.0054) 
3 (40.771731, -73.956313) 
4  (40.8027, -73.949187) 
Name: geotag, dtype: object' 

首先我想:

apt_loc = apt_data['geotag'] 
apt_loc_ar = np.array(apt_loc['geotag'], dtype=dt) 

但拋出此錯誤:

Traceback (most recent call last): 

File "<ipython-input-60-3a853e355c9a>", line 1, in <module> 
apt_loc_ar = np.array(apt_loc['geotag'], dtype=dt) 

File "/python3.5/site- 
packages/pandas/core/series.py", line 603, in __getitem__ 
result = self.index.get_value(self, key) 

File "/python3.5/site- 
packages/pandas/indexes/base.py", line 2169, in get_value 
tz=getattr(series.dtype, 'tz', None)) 

File "pandas/index.pyx", line 98, in pandas.index.IndexEngine.get_value 
(pandas/index.c:3557) 

File "pandas/index.pyx", line 106, in pandas.index.IndexEngine.get_value 
(pandas/index.c:3240) 

File "pandas/index.pyx", line 156, in pandas.index.IndexEngine.get_loc 
(pandas/index.c:4363) 

KeyError: 'geotag' 

我試着使用 apt_data['geotag'] = pd.to_numeric(apt_data['geotag'], errors='coerce')

這給了我NaN的所有條目。

謝謝。

+0

所以每個觀測是在字符串內的元組? –

+0

'type(apt_loc.get_value(0,'geotag')'產生'str'。我認爲括號和逗號也只是字符串值。 – epic556

回答

1

您可以使用literal_evalast模塊和應用功能,以您的數據框,如下圖所示:

import pandas as pd 
from ast import literal_eval as le 

df = pd.DataFrame(["(40.7763, -73.9529)","(40.72785, -73.983307)"], columns=["geotag"]) 

df["geotag"] = df["geotag"].apply(func=lambda x: le(x)) 

輸出:

>>> for k in df["geotag"]: 
     for j in k: print(type(j)) 
<class 'float'> 
<class 'float'> 
<class 'float'> 
<class 'float'> 
+1

謝謝你這爲我工作。 – epic556

+0

歡迎。如果您有任何問題,請不要猶豫留下您的評論。快樂的編碼。 –

1

較短的版本Chiheb的回答(無需輸入):

apt_data.geotag.apply(eval) 
+0

'eval'不安全,建議不要使用它。 –

1

考慮系列g

g = pd.Series(
    [ 
     '(40.7763, -73.9529)', 
     '(40.72785, -73.983307)', 
     '(40.7339, -74.0054)', 
     '(40.771731, -73.956313)', 
     '(40.8027, -73.949187)' 
    ], name='geotag' 
) 

選項1
literal_eval

from ast import literal_eval 
import pandas as pd 

g.apply(literal_eval) 

0  (40.7763, -73.9529) 
1  (40.72785, -73.983307) 
2  (40.7339, -74.0054) 
3 (40.771731, -73.956313) 
4  (40.8027, -73.949187) 
Name: geotag, dtype: object 

選項2
在理解literal_eval和重建

pd.Series([literal_eval(v) for v in g.values.tolist()], g.index, name=g.name) 

0  (40.7763, -73.9529) 
1  (40.72785, -73.983307) 
2  (40.7339, -74.0054) 
3 (40.771731, -73.956313) 
4  (40.8027, -73.949187) 
Name: geotag, dtype: object 

選項3
applystr功能

g.apply(lambda x: [float(y) for y in x.strip('()').split(', ')]) 

0  [40.7763, -73.9529] 
1  [40.72785, -73.983307] 
2  [40.7339, -74.0054] 
3 [40.771731, -73.956313] 
4  [40.8027, -73.949187] 
Name: geotag, dtype: object 

選項4
str功能於一身的理解

pd.Series([[float(x) for x in v.strip('()').split(', ')] for v in g.values.tolist()], g.index, name=g.name) 

0  [40.7763, -73.9529] 
1  [40.72785, -73.983307] 
2  [40.7339, -74.0054] 
3 [40.771731, -73.956313] 
4  [40.8027, -73.949187] 
Name: geotag, dtype: object 

時序

%timeit g.apply(literal_eval) 
10000 loops, best of 3: 158 µs per loop 

%timeit g.apply(lambda x: [float(y) for y in x.strip('()').split(', ')]) 
10000 loops, best of 3: 107 µs per loop 

%timeit pd.Series([literal_eval(v) for v in g.values.tolist()], g.index, name=g.name) 
10000 loops, best of 3: 119 µs per loop 

%timeit pd.Series([[float(x) for x in v.strip('()').split(', ')] for v in g.values.tolist()], g.index, name=g.name) 
10000 loops, best of 3: 65.3 µs per loop 
+1

我真的很喜歡讀你的答案。你可以寫一本關於Python /熊貓的書。 :-)。你應得到一個upvote得到這個答案。 –

+1

謝謝你的評論@ChihebNexus – piRSquared

相關問題