使用熊貓閱讀txt文件使用分隔符創建NaNs列

我想讀入熊貓文本文件，但它爲所有行創建所有NaN。我嘗試使用分隔符來分隔由\分隔的變量，但這不能正常工作。下面是數據文件看起來像在文本文件中使用熊貓閱讀txt文件使用分隔符創建NaNs列

數據：

Date   Name   Group Direction 
2015-01-01 Smith.John  -   In 
2015-01-01 Smith.Jan  Claims  Out 
2015-01-01  -   Claims  In 
2015-01-01 Smith.Jessica Other  In

這是我第一次在數據讀取嘗試：

pd.read_csv('C:\Users\Desktop\skills.txt', 
     names=['Date','AgentName','Group','Direction'])

然而，這產生

Date AgentID  AssignedWorkGroup CallDirection 
0 Date\tAgentID\tAssignedWorkGroup\tCallDire... NaN  NaN  NaN 
1 2015-09-01\Smith.John\t-\tIn     NaN  NaN  NaN

所以我試着擺脫了\做：

pd.read_csv('C:\Users\Desktop\skills.txt', 
     names=['Date','AgentName','Group','Direction'],delimiter='\\')

但是，這仍然產生相同的結果。所以有幾件事情。一個是我無法打破'\'。此外，看起來像讀取的第一行是標題。我嘗試使用header = None來擺脫它們，但是這對我來說也不是很好。還可以看出他們是（我假設的文本？）是發生在每一個變量

前我感覺好像我處理這個錯誤

來源

2015-11-03 user3120266

你指定的列名，但是這會混淆分析器，因爲它是解釋第一行數據，也看起來你有製表符分隔值。試試這個：'pd.read_csv（'C：\ Users \ Desktop \ skills.txt'， names = ['Date'，'AgentName'，'Group'，'Direction']，skiprows = 1，sep ='\ T'）' – EdChum

因爲你傳遞的備用列名，這意味着CSV解析器正在將第一行解釋爲有效的數據行，因此您需要通過skiprows=1來跳過標題，此外默認分隔符是逗號,，但它看起來像是數據是製表符還是多空格分隔，因此您可以通過sep='\t'或sep='\s+' 。

目前還不清楚，如果您的數據是製表符或空格分開，但對我下面的工作：

In [18]: 
t="""Date   Name   Group Direction 
2015-01-01 Smith.John  -   In 
2015-01-01 Smith.Jan  Claims  Out 
2015-01-01  -   Claims  In 
2015-01-01 Smith.Jessica Other  In""" 
pd.read_csv(io.StringIO(t), names=['Date','AgentName','Group','Direction'], skiprows=1, sep='\s+') 

Out[18]: 
     Date  AgentName Group Direction 
0 2015-01-01  Smith.John  -  In 
1 2015-01-01  Smith.Jan Claims  Out 
2 2015-01-01    - Claims  In 
3 2015-01-01 Smith.Jessica Other  In

所以我希望

pd.read_csv('C:\Users\Desktop\skills.txt', names=['Date','AgentName','Group','Direction'], skiprows=1, sep='\t')

或

pd.read_csv('C:\Users\Desktop\skills.txt', names=['Date','AgentName','Group','Direction'], skiprows=1, sep='\s+')

爲你工作

來源

2015-11-03 21:55:10 EdChum

使用空格作爲分隔符的工作原理：

df = pd.read_csv('C:\Users\Desktop\skills.txt', delim_whitespace=True) 
df.columns = ['Date','AgentName','Group','Direction']

輸出：

  Date  AgentName Group Direction 
0 2015-01-01  Smith.John  -  In 
1 2015-01-01  Smith.Jan Claims  Out 
2 2015-01-01    - Claims  In 
3 2015-01-01 Smith.Jessica Other  In

來源

2015-11-03 21:55:49

使用熊貓閱讀txt文件使用分隔符創建NaNs列

回答

相關問題