在熊貓數據框中刪除字符

我的目標是（1）導入Twitter JSON，（2）提取感興趣的數據，（3）爲感興趣的變量創建熊貓數據框。這裏是我的代碼：在熊貓數據框中刪除字符

import json 
import pandas as pd 

tweets = [] 
for line in open('00.json'): 
    try: 
     tweet = json.loads(line) 
     tweets.append(tweet) 
    except: 
     continue 

# Tweets often have missing data, therefore use -if- when extracting "keys" 

tweet = tweets[0] 

ids = [tweet['id_str'] for tweet in tweets if 'id_str' in tweet] 
text = [tweet['text'] for tweet in tweets if 'text' in tweet] 
lang = [tweet['lang'] for tweet in tweets if 'lang' in tweet] 
geo = [tweet['geo'] for tweet in tweets if 'geo' in tweet] 
place = [tweet['place'] for tweet in tweets if 'place' in tweet] 

# Create a data frame (using pd.Index may be "incorrect", but I am a noob) 
df=pd.DataFrame({'Ids':pd.Index(ids), 
       'Text':pd.Index(text), 
       'Lang':pd.Index(lang), 
       'Geo':pd.Index(geo), 
       'Place':pd.Index(place)}) 

# Create a data frame satisfying conditions: 
df2 = df[(df['Lang']==('en')) & (df['Geo'].dropna())]

到目前爲止，一切似乎工作正常。

現在，地理結果在下面的例子中提取的值：

df2.loc[1921,'Geo'] 
{'coordinates': [39.11890951, -84.48903638], 'type': 'Point'}

爲了擺脫一切的除了方括號我嘗試使用內座標：

df2.Geo.str.replace("[({':]", "") ### results in NaN 
# and also this: 
df2['Geo'] = df2['Geo'].map(lambda x: x.lstrip('{'coordinates': [').rstrip('], 'type': 'Point'')) ### results in syntax error

請告知只有獲得座標值的正確方法。

來源

2017-03-07 kiton

您的問題中的以下行表明這是理解返回對象的基礎數據類型的問題。

df2.loc[1921,'Geo'] 
{'coordinates': [39.11890951, -84.48903638], 'type': 'Point'}

您正在返回一個Python字典 - 不是字符串！如果您只想返回座標值，則只需使用'coordinates'鍵即可返回這些值。

df2.loc[1921,'Geo']['coordinates'] 
[39.11890951, -84.48903638]

在這種情況下返回的對象將是一個包含兩個座標值的Python列表對象。如果您只需要其中一個值，則可以對列表進行切片，例如

df2.loc[1921,'Geo']['coordinates'][0] 
39.11890951

此工作流程比鑄造的字典爲字符串，解析字符串，並且你正在嘗試做奪回座標值更容易對付。

因此，讓我們假設你想創建一個名爲「geo_coord0」新欄包含了所有在第一位置的座標（如上圖所示）。你可以使用一個類似如下：

df2["geo_coord0"] = [x['coordinates'][0] for x in df2['Geo']]

這將使用Python列表解析來遍歷所有條目在df2['Geo']列和每個條目它採用我們上面用來返回第一座標值相同的語法。然後它將這些值分配給df2中的新列。

用於在上面所討論的數據結構的更多細節，請參見Python documentation on data structures。

來源

2017-03-07 17:57:41 Brian

布賴恩，感謝您提供的解釋。事實上，我搞砸了你強調的問題。用你的代碼建議，現在問題就解決了。然而，我一定會挖掘關於數據結構的文檔來了解它。 – kiton

@kiton很高興能幫到你！請考慮[接受我的答案作爲一個解決方案（http://stackoverflow.com/help/someone-answers）如果你覺得這樣可以解決這個問題。 – Brian

在熊貓數據框中刪除字符

回答

相關問題