分組列的唯一值在Python

我有兩列數據集，我需要把它從這種格式變更：分組列的唯一值在Python

這個

10 1 5 3 
11 5 4 
12 6 2

我需要在每一個獨特的價值第一列將在其自己的行。

我是一名Python初學者，在我的文本文件中無法閱讀，我不知道如何繼續。

來源

2017-06-17 Ben Hemingway

什麼是在字段分隔符的文件？ – RomanPerekhrest

您可以使用Pandas數據框。

import pandas as pd 

df = pd.DataFrame({'A':[10,10,10,11,11,12,12],'B':[1,5,3,5,4,6,2]}) 
print(df)

輸出：

讓我們用groupby和join：

df.groupby('A')['B'].apply(lambda x:' '.join(x.astype(str)))

輸出：

A 
10 1 5 3 
11  5 4 
12  6 2 
Name: B, dtype: object

來源

2017-06-17 15:52:56

如果這個答案對你有幫助，你會[接受]（https://meta.stackexchange.com/questions/5234/how-does-accepting-an-answer-work?answertab=active#tab-top）這個答案嗎？ –

接受答案是什麼使這個網站流行。謝謝。 –

一個例子使用itertools.groupby只;這一切都在Python標準庫中（儘管pandas version更簡潔！）。

假設你要組鍵相鄰，這可能都可以懶洋洋地完成（不需要在任何時候在內存中的所有數據）：

from io import StringIO 
from itertools import groupby 

text = '''10 1 
10 5 
10 3 
11 5 
11 4 
12 6 
12 2''' 

# read and group data: 
with StringIO(text) as file: 
    keys = [] 
    res = {} 

    data = (line.strip().split() for line in file) 

    for k, g in groupby(data, key=lambda x: x[0]): 
     keys.append(k) 
     res[k] = [item[1] for item in g] 

print(keys) # ['10', '11', '12'] 
print(res) # {'12': ['6', '2'], '10': ['1', '5', '3'], '11': ['5', '4']} 

# write grouped data: 
with StringIO() as out_file: 
    for key in keys: 
     out_file.write('{:3s}'.format(key)) 
     out_file.write(' '.join(['{:3s}'.format(item) for item in res[key]])) 
     out_file.write('\n') 
    print(out_file.getvalue()) 
    # 10 1 5 3 
    # 11 5 4 
    # 12 6 2

你可以再更換with StringIO(text) as file:的東西如with open('infile.txt', 'r') as file用於程序讀取您的實際文件（以及類似的輸出文件與open('outfile.txt', 'w')）。

again：當然，每次找到密鑰時都可以直接寫入輸出文件;這樣你就不需要在內存中的所有數據在任何時間：

with StringIO(text) as file, StringIO() as out_file: 

    data = (line.strip().split() for line in file) 

    for k, g in groupby(data, key=lambda x: x[0]): 
     out_file.write('{:3s}'.format(k)) 
     out_file.write(' '.join(['{:3s}'.format(item[1]) for item in g])) 
     out_file.write('\n') 

    print(out_file.getvalue())

來源

2017-06-17 16:18:07

使用collections.defaultdict子類：

import collections 
with open('yourfile.txt', 'r') as f: 
    d = collections.defaultdict(list) 
    for k,v in (l.split() for l in f.read().splitlines()): # processing each line 
     d[k].append(v)    # accumulating values for the same 1st column 
    for k,v in sorted(d.items()): # outputting grouped sequences 
     print('%s %s' % (k,' '.join(v)))

輸出：

10 1 5 3 
11 5 4 
12 6 2

來源

2017-06-17 16:21:13 RomanPerekhrest

問題：我嘗試避免使用'defaultdict'並將其替換爲'd.setdefault（k，[]）。append（v）'中的'dict.setdefault'。你有什麼意見嗎？ –

@hiroprotagonist，來自python文檔：'這種技術比使用dict.setdefault（）的等效技術更簡單快捷：'https://docs.python.org/3.6/library/collections.html?highlight=defaultdict#defaultdict - 例子 – RomanPerekhrest

啊！學到了什麼。謝謝！ –

使用pandas可能更輕鬆。您可以使用read_csv函數來讀取txt文件，其中數據由空格或空格分隔。

import pandas as pd 

df = pd.read_csv("input.txt", header=None, delimiter="\s+") 
# setting column names 
df.columns = ['col1', 'col2'] 
df

這會給dataframe輸出：

col1 col2 0 10 1 1 10 5 2 10 3 3 11 5 4 11 4 5 12 6 6 12 2

在以前的其他answer閱讀txt文件dataframe，類似於apply後，您還可以使用aggregate和join：

df_combine = df.groupby('col1')['col2'].agg(lambda col: ' '.join(col.astype('str'))).reset_index() 
df_combine

輸出： col1 col2 0 10 1 5 3 1 11 5 4 2 12 6 2

來源

2017-06-17 16:42:03 0p3n5ourcE

我發現這個溶液用dictonaries：

with open("data.txt", encoding='utf-8') as data: 
    file = data.readlines() 

    dic = {} 
    for line in file: 
     list1 = line.split() 
     try: 
      dic[list1[0]] += list1[1] + ' ' 
     except KeyError: 
      dic[list1[0]] = list1[1] + ' ' 

    for k,v in dic.items(): 
     print(k,v)

OUTPUT

東西更多個官能

def getdata(datafile): 
    with open(datafile, encoding='utf-8') as data: 
     file = data.readlines() 

    dic = {} 
    for line in file: 
     list1 = line.split() 
     try: 
      dic[list1[0]] += list1[1] + ' ' 
     except KeyError: 
      dic[list1[0]] = list1[1] + ' ' 

    for k,v in dic.items(): 
     v = v.split() 
     print(k, ':',v) 

getdata("data.txt")

OUTPUT

11：[ '5'， '4']

12：[ '6'， '2']

10：[ '1'， '5'， '3']

來源

2017-06-18 04:45:39

分組列的唯一值在Python

回答

相關問題