2017-10-14 88 views
0

enter image description herehow to convert these datasets into valid datasets ,to do further case study如何RAW數據集轉換成標準化的數據集

我抄這些數據集,現在我想在標準化形式這些數據集,我是一個初學者機智對於數據的科學,所以我怎麼能通過使用Python代碼

IS_MOBILE,n_products_viewed,visit_duration,is_returning_visitor,TIME_OF_DAY進一步做,user_action 1,0,0.657509946,0,3,0 1,1,0.568571234,0,2,1 1,0, 0.042245997,1,1,0 1,1,1.659793381,1,1,2 0,1,2.014744849,1,1,2 1,1,0.512447387,1,1,2 0,0,1.440327098,1,1,0 1,0,0.035260233,0,3,0 0,1,1.490764094,0,0,1 0 ,0,0.005837521,1,3,0 0,4,2.04604049,1,0,3 0,0,0.955889466,0,3,0

+0

它會更好,如果你已經發布你的RAW數據集和預期的數據集...以文本形式 – RomanPerekhrest

+0

是好的sir.n預計是連續和列明智的應該得到設置 – user8747401

+0

也發佈* standardize形式* – RomanPerekhrest

回答

0

我假定您正在整理您的數據。以下是對整潔數據的定義的一般讓步。

Each variable you measure should be in one column. 
Each different observation of that variable should be in a different row. 
There should be one table for each "kind" of variable. 
If you have multiple tables, they should include a column in the table that allows them to be linked. 

https://en.wikipedia.org/wiki/Tidy_data

我DONOT看到任何問題與具有逗號作爲分隔符。 pandas可以用pandas.read_csv()加載csv。

如果你想做一些清理和重新排列的數據,你可以使用pivot_table和融合熊貓庫的方法。