0
數據文件是here。pandas中的expanding_corr函數給出NaN
我只是想計算兩個數據幀的列之間兩兩相關性:
In [7]: import os
In [8]: import pandas as pd
In [9]: import numpy as np
In [10]: from pandas import Series, DataFrame
In [12]: blog_dat = pd.read_table("blogdata.txt", index_col="Blog")
In [13]: blog_dat = blog_dat.astype(float)
In [14]: all(blog_dat.notnull())
Out[14]: True
In [15]: x = DataFrame(np.random.randn(99*4).reshape((99, 4)))
In [16]: pd.expanding_corr(blog_dat.iloc[:, :4], blog_dat.iloc[:, :4], pairwise=True)[-1, :, :]
Out[16]:
china kids music yahoo
china 1.000000 0.053069 0.026599 0.246957
kids 0.053069 1.000000 0.409978 0.094636
music 0.026599 0.409978 1.000000 0.055923
yahoo 0.246957 0.094636 0.055923 1.000000
In [17]: pd.expanding_corr(blog_dat.iloc[:, :4], x, pairwise=True)[-1, :, :]
/usr/local/lib/python3.4/site-packages/pandas/core/index.py:1240: RuntimeWarning: unorderable types: str() < int(), sort order is undefined for incomparable objects
"incomparable objects" % e, RuntimeWarning)
/usr/local/lib/python3.4/site-packages/pandas/core/index.py:1240: RuntimeWarning: unorderable types: int() < str(), sort order is undefined for incomparable objects
"incomparable objects" % e, RuntimeWarning)
/usr/local/lib/python3.4/site-packages/pandas/core/index.py:1254: RuntimeWarning: unorderable types: str() > int(), sort order is undefined for incomparable objects
"incomparable objects" % e, RuntimeWarning)
/usr/local/lib/python3.4/site-packages/pandas/core/index.py:1254: RuntimeWarning: unorderable types: int() > str(), sort order is undefined for incomparable objects
"incomparable objects" % e, RuntimeWarning)
Out[17]:
0 1 2 3
china NaN NaN NaN NaN
kids NaN NaN NaN NaN
music NaN NaN NaN NaN
yahoo NaN NaN NaN NaN
的NaN的走不走,即使我給索引和列名x
。
酷,其實只有索引需要與blog_dat同步。但爲什麼即使這是必要的也超出了我。 – qed
Pandas中的許多操作與索引保持一致。來自兩個Series的數據點的相關性不與整數索引位置相匹配(就像NumPy所做的那樣)。相反,數據點通過索引進行對齊。如果索引不匹配,則數據點完全錯過對方,相關性未知,因此返回NaN。 – unutbu
@qed:感謝您的更正。 – unutbu