2012-10-09 42 views
0

我有一個巨大的文件,其中有一些缺失行。數據需要植根於國家。的CSV - 插入缺少的行

輸入數據是這樣的:

csv_str = """Type,Country,State,County,City, 
1,USA,,, 
2,USA,OH,, 
3,USA,OH,Franklin, 
4,USA,OH,Franklin,Columbus 
4,USA,OH,Franklin,Springfield 
4,USA,WI,Dane,Madison 
""" 

其需要是:

csv_str = """Type,Country,State,County,City, 
1,USA,,, 
2,USA,OH,, 
3,USA,OH,Franklin, 
4,USA,OH,Franklin,Columbus 
4,USA,OH,Franklin,Springfield 
4,USA,WI,, 
4,USA,WI,Dane, 
4,USA,WI,Dane,Madison 
""" 

按我的邏輯的關鍵是Type字段,其​​中,如果我不能找到一個縣(類型3)對於一個城市(類型4),然後插入一行到達縣的字段。

同樣的,縣。如果我找不到一個州(類型2)爲一個縣(類型3),那麼插入一行到最多州的字段。

由於我缺乏對python中的設施的理解,我正在嘗試更多的蠻力方法。這是有點問題,因爲我需要對同一個文件進行大量迭代。

我也試過谷歌精煉,但無法得到它的工作。手動操作很容易出錯。

任何幫助表示讚賞。

import csv 
import io 

csv_str = """Type,Country,State,County,City, 
1,USA,,, 
2,USA,OH,, 
3,USA,OH,Franklin, 
4,USA,OH,Franklin,Columbus 
4,USA,OH,Franklin,Springfield 
4,USA,WI,Dane,Madison 
""" 
found_county =[] 
missing_county =[] 

def check_missing_county(row): 
    found = False 
    for elm in found_county: 
     if elm.Type == row.Type: 
      found = True 
    if not found: 
     missing_county.append(row) 
     print(row) 

reader = csv.reader(io.StringIO(csv_str)) 
for row in reader: 
    check_missing_county(row) 
+0

所以,你只是想生成缺少的國家的名單? – martineau

回答

1

我敲了下面根據我的問題的理解:

import csv 
import io 

csv_str = u"""Type,Country,State,County,City, 
1,USA,,, 
2,USA,OH,, 
3,USA,OH,Franklin, 
4,USA,OH,Franklin,Columbus 
4,USA,OH,Franklin,Springfield 
4,USA,WI,Dane,Madison 
""" 

counties = [] 
states = [] 


def handle_missing_data(row): 
    try: 
     rtype = int(row[0]) 
    except ValueError: 
     return [] 

    rtype = row[0] 
    country = row[1] 
    state = row[2] 
    county = row[3] 

    rows = [] 
    # if a state is present and it hasn't a row of it's own 
    if state and state not in states: 
     rows.append([rtype, country, state, '', '']) 
     states.append(state) 

    # if a county is present and it hasn't a row of it's own 
    if county and county not in counties: 
     rows.append([rtype, country, state, county, '']) 
     counties.append(county) 

    # if the row hasn't already been added add it now 
    if row not in rows: 
     rows.append(row) 

    return rows 

csvf = io.StringIO(csv_str) 
reader = csv.reader(csvf) 
for row in reader: 
    new_rows = handle_missing_data(row) 
    for new_row in new_rows: 
     print new_row 
+0

感謝John的努力,這很好。 – bsr