熊貓將整數值作爲字符串在排序時處理？爲什麼？

我想根據兩列的值對熊貓數據框進行排序。出於某種原因，它將整數視爲字符串，而早些時候這些值的幾個代碼仍然是整數。不知道是什麼原因造成的變化，但任何方式：熊貓將整數值作爲字符串在排序時處理？爲什麼？

df = 

contig pos ref haplotype_block hap_X hap_Y odds_ratio My_hap Sp_hap 
2 5207 T 1856 T A 167.922 T A 
2 5238 G 1856 C G - C G 
2 5723 A 1856 A T - A T 
2 5867 C 1856 T C - T C 
2 155667 G 2816 G * 1.0 N N 
2 155670 T 2816 T * - N N 
2 67910 C 2 C T 0.21600000000000003 T C 
2 67941 A 2 A T - T A 
2 68016 A 2 A G - G A 
2 118146 C 132 T C 1369.0 T C 
2 118237 A 132 C A - C A 
2 118938 A 1157 T A 0.002 A T 


df.sort_values(by=['contig', 'pos'], inplace=True, ascending=False) 

print(df) #is giving me 


contig pos ref haplotype_block hap_X hap_Y odds_ratio My_hap Sp_hap 
2 118146 C 132 T C 1369.0 T C 
2 118237 A 132 C A - C A 
2 118938 A 1157 T A 0.002 A T 
2 155667 G 2816 G * 1.0 N N 
2 155670 T 2816 T * - N N 
2 5207 T 1856 T A 167.922 T A 
2 5238 G 1856 C G - C G 
2 5723 A 1856 A T - A T 
2 5867 C 1856 T C - T C 
......

因此，它只能使用兩列(contig and pos)的第一個數字排序的數據。這是爲什麼發生？和解決它的一個非常簡單的內存有效的方式？

感謝，

帖子編輯細節：

print(df.info()) 

<class 'pandas.core.frame.DataFrame'> 
RangeIndex: 333 entries, 0 to 332 
Data columns (total 9 columns): 
contig    333 non-null int64 
pos    333 non-null object 
ref    333 non-null object 
haplotype_block 333 non-null int64 
hap_X    333 non-null object 
hap_Y    333 non-null object 
odds_ratio   333 non-null object 
My_hap    333 non-null object 
Sp_hap    333 non-null object 
dtypes: int64(2), object(7) 
memory usage: 23.5+ KB 
None

來源

2017-04-13 everestial007

什麼是這裏的dtypes？什麼'df.info（）'顯示這些列？你有沒有嘗試將重疊羣投射到int？ 'df ['contig'] = df ['contig']。astype（int）'同樣對'pos' – EdChum

感謝！ – everestial007

值轉換爲整數：

df['contig'] = df['contig'].astype(int) 
df['pos'] = df['pos'].astype(int)

然後排序與inplace

df.sort_values(by=['contig', 'pos'], inplace=True, ascending=True)

謝謝，

來源

2017-04-13 20:13:05 everestial007

熊貓將整數值作爲字符串在排序時處理？爲什麼？

回答

相關問題