2017-03-29 146 views
1

我正在構建測試用例,我想比較2個數據框。 儘管數據幀具有相同的列和值,但assert_frame_equal報告不相等。 列順序不同,我嘗試重新排列列沒有任何成功。大熊貓assert_frame_equal錯誤

在使用下面的函數我的測試用例林:

testing.assert_frame_equal(expected, tested, check_dtype=False) 

第一數據幀被聲明這樣的:

df2 = pandas.DataFrame({ 
    'artista': [u'Beyoncé', 'Radiolab', 'Xmas', 'Beyonce'], 
    'mid_sugerido': ['/g/11bz0dg4b_', '/g/11bt_6j9dk', '/g/11c2nz8jc2', '/g/11bt_6jXXX'], 
    'texto': ['Lemonade', 'Radiolab', 'Merry Christmas Lil Mama', 'Beyonce'], 
    'busqueda': [u'Beyoncé', 'Radiolab', 'Xmas', 'Beyonce'], 
    'texto_sugerido': ['Lemonade', 'Radiolab', 'Merry Christmas Lil Mama', 'Beyonce'], 
    'artista_sugerido': [u'Beyoncé', 'Radiolab', None, 'Beyonce'], 
    'media_sugerido': ['album', 'album', 'track', 'album'], 

}) 

熊貓數據幀PD2:

artista artista_sugerido busqueda media_sugerido mid_sugerido \ 
0 Beyoncé   Beyoncé Beyoncé   album /g/11bz0dg4b_ 
1 Radiolab   Radiolab Radiolab   album /g/11bt_6j9dk 
2  Xmas    None  Xmas   track /g/11c2nz8jc2 
3 Beyonce   Beyonce Beyonce   album /g/11bt_6jXXX 

         texto   texto_sugerido 
0     Lemonade     Lemonade 
1     Radiolab     Radiolab 
2 Merry Christmas Lil Mama Merry Christmas Lil Mama 
3     Beyonce     Beyonce 

第二數據幀是從函數(結果)返回的數據幀。

artista busqueda mid_sugerido      texto \ 
0 Beyoncé Beyoncé /g/11bz0dg4b_     Lemonade 
1 Radiolab Radiolab /g/11bt_6j9dk     Radiolab 
2  Xmas  Xmas /g/11c2nz8jc2 Merry Christmas Lil Mama 
3 Beyonce Beyonce /g/11bt_6jXXX     Beyonce 

      texto_sugerido artista_sugerido media_sugerido 
0     Lemonade   Beyoncé   album 
1     Radiolab   Radiolab   album 
2 Merry Christmas Lil Mama    None   track 
3     Beyonce   Beyonce   album 

我得到以下錯誤當我運行:assert_frame_equal(df2, result)

Traceback (most recent call last): 
    File "/Users/spicyramen/Documents/Development/parzee/python/coverage/experimental/pandas_creation.py", line 158, in <module> 
    assert_frame_equal(df6, _Normalize(df5, test_dict)) 
    File "/Users/spicyramen/Documents/Development/parzee/python/coverage/experimental/pandas_creation.py", line 16, in assert_frame_equal 
    testing.assert_frame_equal(expected, tested, check_dtype=False) 
    File "/Library/Python/2.7/site-packages/pandas/util/testing.py", line 1142, in assert_frame_equal 
    obj='{0}.columns'.format(obj)) 
    File "/Library/Python/2.7/site-packages/pandas/util/testing.py", line 761, in assert_index_equal 
    obj=obj, lobj=left, robj=right) 
    File "pandas/src/testing.pyx", line 58, in pandas._testing.assert_almost_equal (pandas/src/testing.c:3887) 
    File "pandas/src/testing.pyx", line 147, in pandas._testing.assert_almost_equal (pandas/src/testing.c:2769) 
    File "/Library/Python/2.7/site-packages/pandas/util/testing.py", line 915, in raise_assert_detail 
    raise AssertionError(msg) 
AssertionError: DataFrame.columns are different 

DataFrame.columns values are different (85.71429 %) 
[left]: Index([u'artista', u'artista_sugerido', u'busqueda', u'media_sugerido', 
     u'mid_sugerido', u'texto', u'texto_sugerido'], 
     dtype='object') 
[right]: Index([u'artista', u'busqueda', u'mid_sugerido', u'texto', u'texto_sugerido', 
     u'artista_sugerido', u'media_sugerido'], 
     dtype='object') 

列是相同的,但不同的順序,如果使用df.sort_index(軸= 1)進行重新排序我得到的列:

Traceback (most recent call last): 
    File "/Users/spicyramen/Documents/Development/parzee/python/coverage/experimental/pandas_creation.py", line 154, in <module> 
    assert_frame_equal(df6.sort_index(axis=1), _Normalize(df5, test_dict).sort_index(axis=1)) 
    File "/Users/spicyramen/Documents/Development/parzee/python/coverage/experimental/pandas_creation.py", line 16, in assert_frame_equal 
    testing.assert_frame_equal(expected, tested, check_dtype=False, check_like=False) 
    File "/Library/Python/2.7/site-packages/pandas/util/testing.py", line 1166, in assert_frame_equal 
    obj='DataFrame.iloc[:, {0}]'.format(i)) 
    File "/Library/Python/2.7/site-packages/pandas/util/testing.py", line 1049, in assert_series_equal 
    check_less_precise, obj='{0}'.format(obj)) 
    File "pandas/src/testing.pyx", line 58, in pandas._testing.assert_almost_equal (pandas/src/testing.c:3887) 
    File "pandas/src/testing.pyx", line 147, in pandas._testing.assert_almost_equal (pandas/src/testing.c:2769) 
    File "/Library/Python/2.7/site-packages/pandas/util/testing.py", line 914, in raise_assert_detail 
    [right]: {3}""".format(obj, message, left, right) 
UnicodeEncodeError: 'ascii' codec can't encode character u'\xe9' in position 7: ordinal not in range(128) 

回答

2

我解決它通過更換:

assert_frame_equal(df2.sort_index(axis=1), myfunction(df1).sort_index(axis=1)) 

l = myfunction(df1) 
assert_frame_equal(df2.sort_index(axis=1), l.sort_index(axis=1)) 
+0

爲什麼這兩種方法之間有什麼區別?他們不應該產生相同的結果嗎? – pansen

+0

我這麼認爲,我還是不明白它爲什麼會起作用,會進一步調試。 – spicyramen