在Python中刪除具有字符串值的熊貓數據框的行3.4.1

我讀過一個csv文件，其中包含8列的熊貓read_csv。每列可能包含int/string/float值。但是我想刪除那些具有字符串值的行，並返回一個只有數字值的數據框。附加csv示例。
我試圖運行此下面的代碼：在Python中刪除具有字符串值的熊貓數據框的行3.4.1

import pandas as pd 
import numpy as np 
df = pd.read_csv('new200_with_errors.csv',dtype={'Geo_Level_1' : int,'Geo_Level_2' : int,'Geo_Level_3' : int,'Product_Level_1' : int,'Product_Level_2' : int,'Product_Level_3' : int,'Total_Sale' : float}) 
print(df)

，但我得到了以下錯誤：

TypeError: unorderable types: NoneType() > int()

我與Python 3.4.1運行。這是示例csv。

Geo_L_1,Geo_L_2,Geo_L_3,Pro_L_1,Pro_L_2,Pro_L_3,Date,Sale 
1, 2, 3, 129, 1, 5193316745, 1/1/2012, 9 
1 ,2, 3, 129, 1, 5193316745, 1/1/2013, 
1, 2, 3, 129, 1, 5193316745, , 8 
1, 2, 3, 129, NA, 5193316745, 1/10/2012, 10 
1, 2, 3, 129, 1, 5193316745, 1/10/2013, 4 
1, 2, 3, ghj, 1, 5193316745, 1/10/2014, 6 
1, 2, 3, 129, 1, 5193316745, 1/11/2012, 4 
1, 2, 3, 129, 1, ghgj, 1/11/2013, 2 
1, 2, 3, 129, 1, 5193316745, 1/11/2014, 6 
1, 2, 3, 129, 1, 5193316745, 1/12/2012, ghgj 
1, 2, 3, 129, 1, 5193316745, 1/12/2013, 5

來源

2014-10-27 sayak_SIBIA

我只計算5列。 Geo_Level_1..3在哪裏？ – fredtantini 2014-10-27 08:05:34

您必須發佈完整df的原始數據，您必須在讀取到熊貓之前或之後清理csv – EdChum 2014-10-27 08:17:44

樣本數據在上述列中存在這些錯誤，這就是爲什麼我只給出樣本數據由所有8列組成。 @fredtantini – 2014-10-27 08:50:00

因此，我想接近這個辦法是嘗試將列轉換爲int使用用戶功能與Try/Catch來處理其中的值不能被強制轉換爲一個int的情況下，這些都將置到NaN的值。掉落，你有一個空值的行，由於某種原因，它實際上有1時，我測試了這個與你的數據的長度，它可能工作您使用LEN 0

In [42]: 
# simple function to try to convert the type, returns NaN if the value cannot be coerced 
def func(x): 
    try: 
     return int(x) 
    except ValueError: 
     return NaN 
# assign multiple columns 
df['Pro_L_1'], df['Pro_L_3'], df['Sale'] = df['Pro_L_1'].apply(func), df['Pro_L_3'].apply(func), df['Sale'].apply(func) 
# drop the 'empty' date row, take a copy() so we don't get a warning 
df = df.loc[df['Date'].str.len() > 1].copy() 
# convert the string to a datetime, if we didn't drop the row it would set the empty row to today's date 
df['Date']= pd.to_datetime(df['Date']) 
# now convert all the dtypes that are numeric to a numeric dtype 
df = df.convert_objects(convert_numeric=True) 
# check the dtypes 
df.dtypes 

Out[42]: 
Geo_L_1    int64 
Geo_L_2    int64 
Geo_L_3    int64 
Pro_L_1   float64 
Pro_L_2   float64 
Pro_L_3   float64 
Date  datetime64[ns] 
Sale    float64 
dtype: object 
In [43]: 
# display the current situation 
df 
Out[43]: 
    Geo_L_1 Geo_L_2 Geo_L_3 Pro_L_1 Pro_L_2  Pro_L_3  Date Sale 
0   1  2  3  129  1 5193316745 2012-01-01  9 
1   1  2  3  129  1 5193316745 2013-01-01 NaN 
3   1  2  3  129  NaN 5193316745 2012-01-10 10 
4   1  2  3  129  1 5193316745 2013-01-10  4 
5   1  2  3  NaN  1 5193316745 2014-01-10  6 
6   1  2  3  129  1 5193316745 2012-01-11  4 
7   1  2  3  129  1   NaN 2013-01-11  2 
8   1  2  3  129  1 5193316745 2014-01-11  6 
9   1  2  3  129  1 5193316745 2012-01-12 NaN 
10  1  2  3  129  1 5193316745 2013-01-12  5 
In [44]: 
# drop the rows 
df.dropna() 
Out[44]: 
    Geo_L_1 Geo_L_2 Geo_L_3 Pro_L_1 Pro_L_2  Pro_L_3  Date Sale 
0   1  2  3  129  1 5193316745 2012-01-01  9 
4   1  2  3  129  1 5193316745 2013-01-10  4 
6   1  2  3  129  1 5193316745 2012-01-11  4 
8   1  2  3  129  1 5193316745 2014-01-11  6 
10  1  2  3  129  1 5193316745 2013-01-12  5

最後一行分配它所以df = df.dropna()

來源

2014-10-27 10:10:27 EdChum

這是偉大的.. Thanx很多..這對我有用。 @Edchum – 2014-10-27 11:20:07

在Python中刪除具有字符串值的熊貓數據框的行3.4.1

回答

相關問題