2011-05-30 63 views
1

我有幾個文件,每個文件都有這樣的數據的複選標記(文件名:裏面的數據通過換行分隔):集團和使用Python

  1. 邁克:平面\ NCAR
  2. 保:平面\ nTrain \ nBoat \ NCAR
  3. 比爾:船\ nTrain
  4. 斯科特:汽車

我如何使用python該組中的所有不同的車輛,然後把一個創建一個CSV文件X上適用的人,如:

output

+1

行號是否也在您的文件中? – 2011-05-30 20:45:17

+0

不,這只是爲了表明存在單獨的文件。 – mike 2011-05-30 21:39:30

回答

1

假設這些行號不是在那裏(足夠容易解決,如果他們是),並與輸入文件類似以下內容:

Mike: Plane 
Car 
Paula: Plane 
Train 
Boat 
Car 
Bill: Boat 
Train 
Scott: Car 

解決方案可以在這裏找到:https://gist.github.com/999481

import sys 
from collections import defaultdict 
import csv 

# see http://stackoverflow.com/questions/6180609/group-and-check-mark-using-python 
def main(): 
    # files = ["group.txt"] 
    files = sys.argv[1:] 
    if len(files) < 1: 
     print "usage: ./python_checkmark.py file1 [file2 ... filen]" 

    name_map = defaultdict(set) 

    for f in files: 
     file_handle = open(f, "r") 
     process_file(file_handle, name_map) 
     file_handle.close() 

    print_csv(sys.stdout, name_map) 

def process_file(input_file, name_map): 
    cur_name = "" 
    for line in input_file: 
     if ":" in line: 
      cur_name, item = [x.strip() for x in line.split(":")] 
     else: 
      item = line.strip() 
     name_map[cur_name].add(item) 


def print_csv(output_file, name_map): 
    names = name_map.keys() 
    items = set([]) 
    for item_set in name_map.values(): 
     items = items.union(item_set) 

    writer = csv.writer(output_file, quoting=csv.QUOTE_MINIMAL) 
    writer.writerow([""] + names) 
    for item in sorted(items): 
     row_contents = map(lambda name:"X" if item in name_map[name] else "", names) 
     row = [item] + row_contents 
     writer.writerow(row) 


if __name__ == '__main__': 
    main() 

輸出:

,Mike,Bill,Scott,Paula 
Boat,,X,,X 
Car,X,,X,X 
Plane,X,,,X 
Train,,X,,X 

這個腳本不做的唯一的事情就是保持列名的順序。可以保持單獨的列表維護順序,因爲maps/dicts本質上是無序的。

+0

這個工作得很好,唯一的事情就是那個文件輸出在每一行之後都會生成一個換行符。 – mike 2011-05-30 22:47:42

+0

嗯..你不想讓每一行都在自己的行嗎? – I82Much 2011-05-30 23:50:03

+1

實際上,問題在於我沒有按照這個[post](http:// stackoverflow)創建二進制輸出csv文件。com/questions/1170214/pythons-csv-writer-produce-wrong-line-terminator) – mike 2011-05-31 13:04:12

0

下面是如何,分析這些類型的文件的一個例子。

請注意,字典在這裏是無序的。您可以使用命令字典(在Python 3.2/2.7的情況下)從標準庫,發現在任何情況下,可用implmentation /反向移植,如果你有舊版本的Python或只保存一個順序附加列表:)

data = {} 
name = None 

with open(file_path) as f: 
    for line in f: 
     if ':' in line: # we have a name here 
      name, first_vehicle = line.split(':') 
      data[name] = set([first_vehicle, ]) # a set of vehicles per name 
     else: 
      if name: 
       data[name].add(line) 

# now a dictionary with names/vehicles is available 
# let's convert it to simple csv-formatted string.. 

# a set of all available vehicles 
vehicles = set(v for vlist in data.values() 
       for v in vlist) 

for name in data: 
    name_vehicles = data[name] 
    csv_vehicles = '' 
    for v in vehicles: 
     if v in name_vehicles: 
      csv_vehicles += v 
     csv_vehicles += ',' 

    csv_line = name + ',' + csv_vehicles 
0

假設,輸入如下:

Mike: Plane 
Car 
Paula: Plane 
Train 
Boat 
Car 
Bill: Boat 
Train 
Scott: Car 

這python腳本,則以車輛在字典中,由人編入索引:

#!/usr/bin/python 

persons={} 
vehicles=set() 

with open('input') as fd: 
    for line in fd: 
     line = line.strip() 
     if ':' in line: 
      tmp = line.split(':') 
      p = tmp[0].strip() 
      v = tmp[1].strip() 
      persons[p]=[v] 
      vehicles.add(v) 
     else: 
      persons[p].append(line) 
      vehicles.add(line) 

for k,v in persons.iteritems(): 
    print k,v 

print 'vehicles', vehicles 

結果:

Mike ['Plane', 'Car'] 
Bill ['Boat', 'Train'] 
Scott ['Car'] 
Paula ['Plane', 'Train', 'Boat', 'Car'] 
vehicles set(['Train', 'Car', 'Plane', 'Boat']) 

現在,所有需要的數據都放在數據結構中。該CSV部分就留給讀者做練習:-)

0

最優雅,最簡單的辦法是,像這樣:

vehiclesToPeople = {} 
people = [] 

for root,dirs,files in os.walk('/path/to/folder/with/files'): 
    for file in files: 
     person = file 
     people += [person] 
     path = os.path.join(root, file) 

     with open(path) as f: 
      for vehicle in f: 
       vehiclesToPeople.setdefault(vehicle,set()).add(person) 

people.sort() 
table = [ ['']+people ] 
for vehicle,owners in peopleToVehicles.items(): 
    table.append([('X' if p in vehiclesToPeople[vehicle] else '') for p in people]) 

csv = '\n'.join(','.join(row) for row in table) 

你可以做pprint.pprint(table)也來關注一下吧。