熊貓數據框用NaN替換字符串使用pd.concat

我有一個由字符串組成的熊貓數據框，即'P1'，'P2'，'P3'，...，null。熊貓數據框用NaN替換字符串使用pd.concat

當我嘗試連接這個數據框與另一個時，所有的字符串被替換爲'NaN'。

看我下面的代碼：

descriptions = pd.read_json('https://raw.githubusercontent.com/ansymo/msr2013-bug_dataset/master/data/v02/eclipse/short_desc.json') 
descriptions = descriptions.reset_index(drop=1) 
descriptions['desc'] = descriptions.short_desc.apply(operator.itemgetter(0)).apply(operator.itemgetter('what')) 
f1=pd.DataFrame(descriptions['desc']) 

bugPrior = pd.read_json('https://raw.githubusercontent.com/ansymo/msr2013-bug_dataset/master/data/v02/eclipse/priority.json') 
bugPrior = bugPrior.reset_index(drop=1) 
bugPrior['priority'] = bugPrior.priority.apply(operator.itemgetter(0)).apply(operator.itemgetter('what')) 
f2=pd.DataFrame(bugPrior['priority']) 

df = pd.concat([f1,f2]) 
print(df.head())

輸出如下：

   desc          priority 
0 Usability issue with external editors (1GE6IRL)  NaN 
1    API - VCM event notification (1G8G6RR)  NaN 
2 Would like a way to take a write lock on a tea...  NaN 
3 getter/setter code generation drops "F" in ".....  NaN 
4 Create Help Index Fails with seemingly incorre...  NaN

任何想法，我怎麼可能會停止這種情況的發生？

最終，我的目標是將所有內容都放在一個數據框中，以便我可以刪除所有具有「空」值的行。這也有助於後面的代碼。

謝謝。

來源

2017-08-29 JohnWayne360

假設您想要水平連接這些列，您需要將axis=1傳遞給pd.concat，因爲默認情況下，連接是垂直的。

df = pd.concat([f1,f2], axis=1)

要刪除那些NaN行，你應該能夠使用df.dropna。之後致電df.reset_index。

df = pd.concat([f1, f2], 1) 
df = df.dropna().reset_index(drop=True) 
print(df.head(10)) 
               desc priority 
0 Create Help Index Fails with seemingly incorre...  P3 
1 Internal compiler error when compiling switch ...  P3 
2 Default text sizes in org.eclipse.jface.resour...  P3 
3 [Presentations] [ViewMgmt] Holding mouse down ...  P3 
4 Parsing of function declarations in stdio.h is...  P2 
5 CCE in RenameResourceAction while renaming ele...  P3 
6 Option to prevent cursor from moving off end o...  P3 
7  Tasks section in the user doc is very stale  P3 
8 Importing existing project with different case...  P3 
9 Workspace in use --> choose new workspace but ...  P3

打印出來df.priority.unique()，我們看到有5個獨特的工作重點：

print(df.priority.unique()) 
array(['P3', 'P2', 'P4', 'P1', 'P5'], dtype=object)

來源

2017-08-29 15:53:54

謝謝你的幫助，這個數據集已經在驅動m個堅果了，這只是數據導入！ – JohnWayne360

我認爲最好不存在從列創建DataFrames：

descriptions = pd.read_json('https://raw.githubusercontent.com/ansymo/msr2013-bug_dataset/master/data/v02/eclipse/short_desc.json') 
descriptions = descriptions.reset_index(drop=1) 

#get Series to f1 
f1 = descriptions.short_desc.apply(operator.itemgetter(0)).apply(operator.itemgetter('what')) 
print (f1.head()) 

bugPrior = pd.read_json('https://raw.githubusercontent.com/ansymo/msr2013-bug_dataset/master/data/v02/eclipse/priority.json') 
bugPrior = bugPrior.reset_index(drop=1) 

#get Series to f2 
f2 = bugPrior.priority.apply(operator.itemgetter(0)).apply(operator.itemgetter('what')) 
print (f2.head())

然後使用相同的解決方案cᴏʟᴅsᴘᴇᴇᴅ答案：

df = pd.concat([f1,f2], axis=1).dropna().reset_index(drop=True) 
print (df.head()) 
              short_desc priority 
0 Create Help Index Fails with seemingly incorre...  P3 
1 Internal compiler error when compiling switch ...  P3 
2 Default text sizes in org.eclipse.jface.resour...  P3 
3 [Presentations] [ViewMgmt] Holding mouse down ...  P3 
4 Parsing of function declarations in stdio.h is...  P2

來源

2017-08-29 16:03:30 jezrael

這正是我的答案。 :) –

沒關係。您不必進行編輯，但謝謝，我很感激。 –

@jezrael感謝您的回答。我想我可能會應用您的建議並創建專欄。 – JohnWayne360

熊貓數據框用NaN替換字符串使用pd.concat

回答

相關問題