Python - 從.dat文件中過濾列並從其他列返回給定值

我是Python的新手，一直在用我創建的（150行）學生ID號，等級，年齡，class_code，area_code等等。我想要處理的數據不僅僅是按某一列（按年級，年齡等）進行過濾，而且還會創建一個與該行（學生ID）不同列的列表。我已經設法找到如何隔離需要查找特定值的列，但無法弄清楚如何創建我需要返回的值的列表。Python - 從.dat文件中過濾列並從其他列返回給定值

因此，這裏是5行中的數據的樣本：

1/A/15/13/43214 
2/I/15/21/58322 
3/C/17/89/68470 
4/I/18/6/57362 
5/I/14/4/00000 
6/A/16/23/34567

我需要的第一列（學生證）名單的基礎上，篩選第二列（級）......（並最終第三列，第四列等，但如果我看到它只是第二個看起來如何，我想我可以找出其他）。另請注意：我沒有在.dat文件中使用標題。

我想出瞭如何隔離/查看第二列。

import numpy 

data = numpy.genfromtxt('/testdata.dat', delimiter='/', dtype='unicode') 

grades = data[:,1] 
print (grades)

打印：

['A' 'I' 'C' 'I' 'I' 'A']

但現在，我怎麼能拉就在第一列的對應於A的，C的，我是爲單獨的列表？

所以我想看到一個列表，也與第1列，爲A的，C的整數之間的逗號，和我的

list from A = [1, 6] 
list from C = [3] 
list from I = [2, 4, 5]

同樣，如果我可以看到它是如何與實現只是第二列，只有一個值（比如說A），我想我可以想出如何爲B's，C's，D's等以及其他列做些什麼。我只需要看一個例子來說明如何應用這個語法，然後就像其他的一樣。

此外，我一直在使用numpy，但也讀了關於熊貓，csv和我認爲這些庫也可能是可能的。但就像我說的，一直在使用numpy來處理.dat文件。我不知道其他庫是否會更容易使用？

來源

2017-06-04 chitown88

大熊貓的解決方案：

import pandas as pd 

df = pd.read_csv('data.txt', header=None, sep='/') 
dfs = {k:v for k,v in df.groupby(1)}

因此，我們有DataFrames的字典：

In [59]: dfs.keys() 
Out[59]: dict_keys(['I', 'C', 'A']) 

In [60]: dfs['I'] 
Out[60]: 
    0 1 2 3  4 
1 2 I 15 21 58322 
3 4 I 18 6 57362 
4 5 I 14 4  0 

In [61]: dfs['C'] 
Out[61]: 
    0 1 2 3  4 
2 3 C 17 89 68470 

In [62]: dfs['A'] 
Out[62]: 
    0 1 2 3  4 
0 1 A 15 13 43214 
5 6 A 16 23 34567

如果你想擁有第一列的細分電子郵件列表：

In [67]: dfs['I'].iloc[:, 0].tolist() 
Out[67]: [2, 4, 5] 

In [68]: dfs['C'].iloc[:, 0].tolist() 
Out[68]: [3] 

In [69]: dfs['A'].iloc[:, 0].tolist() 
Out[69]: [1, 6]

來源

2017-06-04 16:08:18 MaxU

您可以瀏覽列表並製作一個布爾值來選擇匹配特定等級的數組。這可能需要一些改進。

import numpy as np 

grades = np.genfromtxt('data.txt', delimiter='/', skip_header=0, dtype='unicode') 


res = {} 
for grade in set(grades[:, 1].tolist()): 
    res[grade] = grades[grades[:, 1]==grade][:,0].tolist() 

print res

來源

2017-06-04 15:45:08

所以我一直在玩到目前爲止發佈的不同解決方案。我喜歡你的解決方案。它將res顯示爲一組列表。我試圖查找，而且我仍在搜索，但有沒有辦法將列表與列表分開？所以我可以基本上是水庫的'A'級別列表，以及水庫等的'C'級別？我所發現的只是將列表添加到集合中，或者從列表中刪除列表，或者列表的子集和列表的子集。但我似乎無法找到任何有關多個列表的集合。 – chitown88

實際上你不需要任何廣告用於這樣一個簡單任務的模塊。 Pure-Python解決方案將逐行讀取文件並使用str.split()對它們進行「解析」，它們將爲您提供您的列表，然後您可以對任何參數進行非常多的過濾。喜歡的東西：

students = {} # store for our students by grade 
with open("testdata.dat", "r") as f: # open the file 
    for line in f: # read the file line by line 
     row = line.strip().split("/") # split the line into individual columns 
     # you can now directly filter your row, or you can store the row in a list for later 
     # let's split them by grade: 
     grade = row[1] # second column of our row is the grade 
     # create/append the sublist in our `students` dict keyed by the grade 
     students[grade] = students.get(grade, []) + [row] 
# now your students dict contains all students split by grade, e.g.: 
a_students = students["A"] 
# [['1', 'A', '15', '13', '43214'], ['6', 'A', '16', '23', '34567']] 

# if you want only to collect the A-grade student IDs, you can get a list of them as: 
student_ids = [entry[0] for entry in students["A"]] 
# ['1', '6']

但是，讓我們回去了幾步 - 如果你想你應該只存儲您的列表，然後更廣義的解決方案創建一個函數通過傳遞的參數進行過濾，所以：

# define a filter function 
# filters should contain a list of filters whereas a filter would be defined as: 
# [position, [values]] 
# and you can define as many as you want 
def filter_sublists(source, filters=None): 
    result = [] # store for our result 
    filters = filters or [] # in case no filter is returned 
    for element in source: # go through every element of our source data 
     try: 
      if all(element[f[0]] in f[1] for f in filters): # check if all our filters match 
       result.append(element) # add the element 
     except IndexError: # invalid filter position or data position, ignore 
      pass 
    return result # return the result 

# now we can use it to filter our data, first lets load our data: 

with open("testdata.dat", "r") as f: # open the file 
    students = [line.strip().split("/") for line in f] # store all our students as a list 

# now we have all the data in the `students` list and we can filter it by any element 
a_students = filter_sublists(students, [[1, ["A"]]]) 
# [['1', 'A', '15', '13', '43214'], ['6', 'A', '16', '23', '34567']] 

# or again, if you just need the IDs: 
a_student_ids = [entry[0] for entry in filter_sublists(students, [[1, ["A"]]])] 
# ['1', '6'] 

# but you can filter by any parameter, for example: 
age_15_students = filter_sublists(students, [[2, ["15"]]]) 
# [['1', 'A', '15', '13', '43214'], ['2', 'I', '15', '21', '58322']] 

# or you can get all I-grade students aged 14 or 15: 
i_students = filter_sublists(students, [[1, ["I"]], [2, ["14", "15"]]]) 
# [['2', 'I', '15', '21', '58322'], ['5', 'I', '14', '4', '00000']]

來源

2017-06-04 16:19:55 zwer

Python - 從.dat文件中過濾列並從其他列返回給定值

回答

相關問題