在熊貓中排序函數，返回凌亂的數據

我想使用下面的代碼使用Pandas中的排序函數對CSV文件中的數據進行排序。原始文件中有229行。但排序的輸出是245行，因爲字段中的一些數據打印在下一行中，而某些行沒有任何值。在熊貓中排序函數，返回凌亂的數據

sample=pd.read_csv("sample.csv" , encoding='latin-1', skipinitialspace=True) 
sample_sorted = sample.sort_values(by = ['rating']) 
sample_sorted.to_csv("sample_sorted.csv")

我認爲，這個問題的發生是因爲在某些細胞中的數據是由產生新的線路輸入。例如，這是原始文件中單元格的內容。當我對原始文件進行排序時，第二行打印在一行中，第三行和第二行之間留空。

"Side effects are way to extreme. 



E-mail me if you have experianced the same things."

有什麼建議嗎？謝謝！

來源

2016-09-05 Mary

你可以發佈：'print（sample.shape）'的輸出嗎？ – MaxU

@MaxU，print（sample.shape）的輸出是（229，10） – Mary

@Merlin，我認爲它可能是文件內部的其他字符，例如阿拉伯字符。是的，文件有標題。 – Mary

您可以嘗試刪除問題列中的換行符。

sample=pd.read_csv("sample.csv" , encoding='latin-1', skipinitialspace=True) 
sample["problem_column"] = (sample["problem_column"]. 
          apply(lambda x: " ".join([word for word in x.split()]) 
          )

看看是否有幫助。很難看出爲什麼沒有可重複的樣本發生這種情況。

來源

2016-09-05 23:12:49

在熊貓中排序函數，返回凌亂的數據

回答

相關問題