Python中導入數據字典和模式

如果我有數據爲：Python中導入數據字典和模式

Code, data_1, data_2, data_3, [....], data204700 

a,1,1,0, ... , 1 
b,1,0,0, ... , 1 
a,1,1,0, ... , 1 
c,0,1,0, ... , 1 
b,1,0,0, ... , 1 
etc. same code different value (0, 1, ?(not known))

我需要建立一個大的矩陣，我想分析一下。

如何導入數據到字典中？

我想用字典返回給我列模式（204.700 + 1）

有一個內置的函數（或包）？

（我預計百分比模式）。我的意思是列在第1 90％1，第2欄。

來源

2013-05-21 GFede - Udacian

字典是鍵值對的集合。你想要什麼你的鑰匙，你想要什麼價值？ –

如果有整數，它必須是0還是1？或者可能有2個或另一個號碼？基本上所有不是第一個都是0或1的列？ –

好了，所以我將假設你的80％的人希望這個在字典中存儲的目的，我會告訴你，你不」不要這樣的數據。使用pandasDataFrame

這是你將如何讓你的代碼放到一個數據幀：

import pandas as pd 
my_file = 'file_name' 
df = pd.read_csv(my_file)

現在

你不需要一個包返回你正在尋找的模式，只寫了一個簡單的算法回報！

def one_percentage(data): 
    #get total number of rows for calculating percentages 
    size = len(data) 
    #get type so only grabbing the correct rows 
    x = data.columns[1] 
    x = data[x].dtype 
    #list of touples to hold amount of 1s and the column names 
    ones = [(i,sum(data[i])) for i in data if data[i].dtype == x] 
    my_dict = {} 
    #create dictionary with column names and percent 
    for x in ones: 
     percent = x[1]/float(size) 
     my_dict[x[0]] = percent 
    return my_dict

現在如果你想獲得那些在任何一列的百分比

，這是你要做的：

percentages = one_percentage(df) 
column_name = 'any_column_name' 
print percentages[column_name]

現在如果你想擁有它做的每一個列

，那麼你就可以搶所有通過他們的列名和循環：

columns = [name for name in percentages] 
for name in columns: 
    print str(percentages[name]) + "% of 1 in column " + name

讓我知道如果你需要任何東西！

來源

2013-05-21 18:17:28

謝謝。我想到的是同樣的解決方案。我們必須添加第三個值（即？ - 未知）。我也會嘗試使用Orange包裝！ –

Python中導入數據字典和模式

回答

相關問題