確保列數據具有1對1匹配

在下表中，我試圖確保1個學生ID有1個名稱。例如，Student_ID101有2個與其關聯的名稱（Adam和Bob）。所以我想獲得Student_ID。確保列數據具有1對1匹配

我需要的結果是ID：101（因爲它有2個與它相關的名字）。

Student_ID Name Text 
101 Adam 234 
200 Cat 45645 
101 Adam 5476456 
200 Cat 34 
101 Bob 456 
200 Cat 456 
200 Cat 4356 
300 Cat 356

我該如何解決這個問題？我不認爲我們可以用字典。我只需要一個方向來解決這個問題。

來源

2017-04-10 Sam

那麼，建議的解決方案有幫助嗎？ – IanS

通過Student_ID分組和將所述函數nunique將由ID計數名的數量：

df.groupby('Student_ID')['Name'].nunique()

您可以篩選結果的上方，或過濾直接原始數據幀：

df.groupby('Student_ID').filter(lambda group: group['Name'].nunique() > 1)

來源

2017-04-10 15:16:48 IanS

字典是一個好主意。用它來映射學生姓名以計算其被看到的次數。

import csv 

students = {} 

with open('test.csv') as fp: 
    next(fp) # skip header 
    for row in csv.reader(fp, delimiter=' ', skipinitialspace=True): 
     if row: 
      student = row[1] 
      if student in students: 
       students[student] += 1 
      else: 
       students[student] = 1 

for student, count in students.items(): 
    if count > 1: 
     print(student, "present mutliptle times")

它的這樣一個好主意，python在collections.Counter中實現了一個。給這個類一個迭代器，它將創建一個字典，計算該迭代器中給定值的出現次數。

import collections 

with open('test.csv') as fp: 
    next(fp) # skip header 
    students = collections.Counter(row[1] 
     for row in csv.reader(fp, delimiter=' ', skipinitialspace=True) 
     if row) 

for student, count in students.items(): 
    if count > 1: 
     print(student, "present mutliptle times")

來源

2017-04-10 15:24:10 tdelaney

確保列數據具有1對1匹配

回答

相關問題