一個Python版本,可以看看這個:
fobj_in = io.StringIO("""Name1, Surname1 Team1
Team2
Team3
Name2, Surname2 Team2
Team4
Name3, Surname3 Team1
Team5""")
fobj_out = io.StringIO()
from collections import defaultdict
teams = defaultdict(list)
for line in fobj_in:
items = line.split()
if len(items) == 3:
name = items[:2]
team = items[2]
else:
team = items[0]
teams[team].append(name)
for team_name in sorted(teams.keys()):
fobj_out.write(team_name + ', ')
for name in teams[team_name][:-1]:
fobj_out.write('{} {}, '.format(name[0], name[1]))
name = teams[team_name][-1]
fobj_out.write('{} {}\n'.format(name[0], name[1]))
fobj_out.seek(0)
print(fobj_out.read())
輸出:
Team1, Name1, Surname1, Name3, Surname3
Team2, Name1, Surname1, Name2, Surname2
Team3, Name1, Surname1
Team4, Name2, Surname2
Team5, Name3, Surname3
只要做到這一點讀取和寫入到一個實際的文件:
fobj_in = open('in_file.txt')
fobj_out = open('out_file.txt', 'w')
EDIT
注:樣品的數據似乎不包含的情況下woud導致多個名稱在輸出一行。
隨着this input data,我們需要改變的代碼:
from collections import defaultdict
teams = defaultdict(list)
for line in fobj_in:
if not line.strip():
continue
items = [entry.strip() for entry in line.split('\t') if entry]
if len(items) == 2:
name = items[0]
team = items[1]
else:
team = items[0]
teams[team].append(name)
for team_name in sorted(teams.keys()):
fobj_out.write(team_name + ', ')
for name in teams[team_name][:-1]:
fobj_out.write('{}, '.format(name))
name = teams[team_name][-1]
fobj_out.write('{}\n'.format(name))
生成的文件內容是這樣的:
"Décore ta vie" (2003), Boilard, Naggy
"Mouki" (2010), Boileau, Sonia
A chacun sa place (2011), Boinem, Victor Emmanuel
Absence (2009) (V), Boillat, Patricia
C.A.L.L.E. (2005), Boillat, Patricia
Comment devenir un trou de cul et enfin plaire aux femmes (2004), Boire, Roger
Couleur de peau: Miel (2012), Boileau, Laurent
Hergé:Les aventures de Tintin (2004), Boillot, Olivier
Isola, là dove si parla la lingua di Bacco (2011) (co-director), Boillat, Patricia
L'île (2011), Boillot, Olivier
La beauté fatale et féroce... (1996), Boire, Roger
Last Call Indian (2010), Boileau, Sonia
Le Temple Oublié (2005), Boillot, Olivier
Le pied tendre (1988), Boire, Roger
Legit (2006), Boinski, James W.
Nubes (2010), Boira, Francisco
Questions nationales (2009), Boire, Roger
Reconciling Rwanda (2007), Boiko, Patricia
Soviet Gymnasts (1955), Boikov, Vladimir
The Corporal's Diary (2008) (V) (head director), Boiko, Patricia
Un gars ben chanceux (1977), Boire, Roger
它又是你現在的輸入結構?一條線,多條線和什麼時候是線路制動器? – Johannes
姓氏和/或隊名中是否有空格?中間是否有製表符,或者是固定列中的團隊名稱? –
@Johannes:輸入非常混亂。唯一的「結構化」部分是「Name1,Surname1」,每次都有一個逗號和1個空格。就團隊而言,他們通常被放置在一個固定的列中,但是,首先報告的團隊(名稱 - 姓氏行中)通常與團隊列不一致,具體取決於包含「姓名,姓氏「 – user2447387