我有幾個文件,每個文件都有這樣的數據的複選標記(文件名:裏面的數據通過換行分隔):集團和使用Python
- 邁克:平面\ NCAR
- 保:平面\ nTrain \ nBoat \ NCAR
- 比爾:船\ nTrain
- 斯科特:汽車
我如何使用python該組中的所有不同的車輛,然後把一個創建一個CSV文件X上適用的人,如:
我有幾個文件,每個文件都有這樣的數據的複選標記(文件名:裏面的數據通過換行分隔):集團和使用Python
我如何使用python該組中的所有不同的車輛,然後把一個創建一個CSV文件X上適用的人,如:
假設這些行號不是在那裏(足夠容易解決,如果他們是),並與輸入文件類似以下內容:
Mike: Plane
Car
Paula: Plane
Train
Boat
Car
Bill: Boat
Train
Scott: Car
解決方案可以在這裏找到:https://gist.github.com/999481
import sys
from collections import defaultdict
import csv
# see http://stackoverflow.com/questions/6180609/group-and-check-mark-using-python
def main():
# files = ["group.txt"]
files = sys.argv[1:]
if len(files) < 1:
print "usage: ./python_checkmark.py file1 [file2 ... filen]"
name_map = defaultdict(set)
for f in files:
file_handle = open(f, "r")
process_file(file_handle, name_map)
file_handle.close()
print_csv(sys.stdout, name_map)
def process_file(input_file, name_map):
cur_name = ""
for line in input_file:
if ":" in line:
cur_name, item = [x.strip() for x in line.split(":")]
else:
item = line.strip()
name_map[cur_name].add(item)
def print_csv(output_file, name_map):
names = name_map.keys()
items = set([])
for item_set in name_map.values():
items = items.union(item_set)
writer = csv.writer(output_file, quoting=csv.QUOTE_MINIMAL)
writer.writerow([""] + names)
for item in sorted(items):
row_contents = map(lambda name:"X" if item in name_map[name] else "", names)
row = [item] + row_contents
writer.writerow(row)
if __name__ == '__main__':
main()
輸出:
,Mike,Bill,Scott,Paula
Boat,,X,,X
Car,X,,X,X
Plane,X,,,X
Train,,X,,X
這個腳本不做的唯一的事情就是保持列名的順序。可以保持單獨的列表維護順序,因爲maps/dicts本質上是無序的。
下面是如何,分析這些類型的文件的一個例子。
請注意,字典在這裏是無序的。您可以使用命令字典(在Python 3.2/2.7的情況下)從標準庫,發現在任何情況下,可用implmentation /反向移植,如果你有舊版本的Python或只保存一個順序附加列表:)
data = {}
name = None
with open(file_path) as f:
for line in f:
if ':' in line: # we have a name here
name, first_vehicle = line.split(':')
data[name] = set([first_vehicle, ]) # a set of vehicles per name
else:
if name:
data[name].add(line)
# now a dictionary with names/vehicles is available
# let's convert it to simple csv-formatted string..
# a set of all available vehicles
vehicles = set(v for vlist in data.values()
for v in vlist)
for name in data:
name_vehicles = data[name]
csv_vehicles = ''
for v in vehicles:
if v in name_vehicles:
csv_vehicles += v
csv_vehicles += ','
csv_line = name + ',' + csv_vehicles
假設,輸入如下:
Mike: Plane
Car
Paula: Plane
Train
Boat
Car
Bill: Boat
Train
Scott: Car
這python腳本,則以車輛在字典中,由人編入索引:
#!/usr/bin/python
persons={}
vehicles=set()
with open('input') as fd:
for line in fd:
line = line.strip()
if ':' in line:
tmp = line.split(':')
p = tmp[0].strip()
v = tmp[1].strip()
persons[p]=[v]
vehicles.add(v)
else:
persons[p].append(line)
vehicles.add(line)
for k,v in persons.iteritems():
print k,v
print 'vehicles', vehicles
結果:
Mike ['Plane', 'Car']
Bill ['Boat', 'Train']
Scott ['Car']
Paula ['Plane', 'Train', 'Boat', 'Car']
vehicles set(['Train', 'Car', 'Plane', 'Boat'])
現在,所有需要的數據都放在數據結構中。該CSV部分就留給讀者做練習:-)
最優雅,最簡單的辦法是,像這樣:
vehiclesToPeople = {}
people = []
for root,dirs,files in os.walk('/path/to/folder/with/files'):
for file in files:
person = file
people += [person]
path = os.path.join(root, file)
with open(path) as f:
for vehicle in f:
vehiclesToPeople.setdefault(vehicle,set()).add(person)
people.sort()
table = [ ['']+people ]
for vehicle,owners in peopleToVehicles.items():
table.append([('X' if p in vehiclesToPeople[vehicle] else '') for p in people])
csv = '\n'.join(','.join(row) for row in table)
你可以做pprint.pprint(table)
也來關注一下吧。
行號是否也在您的文件中? – 2011-05-30 20:45:17
不,這只是爲了表明存在單獨的文件。 – mike 2011-05-30 21:39:30