2017-06-13 81 views
2

首先我想說我是不是要求你寫代碼。我只想討論並獲得關於編寫此程序的最佳方法的反饋,因爲我一直在研究如何解決問題。將CSV文件讀入字典?

我的程序應該打開它包含7列的CSV文件:

Name of the state,Crop,Crop title,Variety,Year,Unit,Value. 

下面是該文件的一部分:

Indiana,Corn,Genetically engineered (GE) corn,Stacked gene varieties,2012,Percent of all corn planted,60 
Indiana,Corn,Genetically engineered (GE) corn,Stacked gene varieties,2013,Percent of all corn planted,73 
Indiana,Corn,Genetically engineered (GE) corn,Stacked gene varieties,2014,Percent of all corn planted,78 
Indiana,Corn,Genetically engineered (GE) corn,Stacked gene varieties,2015,Percent of all corn planted,76 
Indiana,Corn,Genetically engineered (GE) corn,Stacked gene varieties,2016,Percent of all corn planted,75 
Indiana,Corn,Genetically engineered (GE) corn,All GE varieties,2000,Percent of all corn planted,11 
Indiana,Corn,Genetically engineered (GE) corn,All GE varieties,2001,Percent of all corn planted,12 
Indiana,Corn,Genetically engineered (GE) corn,All GE varieties,2002,Percent of all corn planted,13 
Indiana,Corn,Genetically engineered (GE) corn,All GE varieties,2003,Percent of all corn planted,16 
Indiana,Corn,Genetically engineered (GE) corn,All GE varieties,2004,Percent of all corn planted,21 
Indiana,Corn,Genetically engineered (GE) corn,All GE varieties,2005,Percent of all corn planted,26 
Indiana,Corn,Genetically engineered (GE) corn,All GE varieties,2006,Percent of all corn planted,40 
Indiana,Corn,Genetically engineered (GE) corn,All GE varieties,2007,Percent of all corn planted,59 
Indiana,Corn,Genetically engineered (GE) corn,All GE varieties,2008,Percent of all corn planted,78 
Indiana,Corn,Genetically engineered (GE) corn,All GE varieties,2009,Percent of all corn planted,79 
Indiana,Corn,Genetically engineered (GE) corn,All GE varieties,2010,Percent of all corn planted,83 
Indiana,Corn,Genetically engineered (GE) corn,All GE varieties,2011,Percent of all corn planted,85 
Indiana,Corn,Genetically engineered (GE) corn,All GE varieties,2012,Percent of all corn planted,84 
Indiana,Corn,Genetically engineered (GE) corn,All GE varieties,2013,Percent of all corn planted,85 
Indiana,Corn,Genetically engineered (GE) corn,All GE varieties,2014,Percent of all corn planted,88 
Indiana,Corn,Genetically engineered (GE) corn,All GE varieties,2015,Percent of all corn planted,88 
Indiana,Corn,Genetically engineered (GE) corn,All GE varieties,2016,Percent of all corn planted,86 

然後閱讀每一行到字典中。在這個文本文件中有許多行,我想要/需要的唯一行是Variety列的內容爲「所有GE品種」的行。請注意每個州也有多條線路。下一步是使用作物的用戶輸入並僅檢查該作物的數據。最後一步是找出(每個州)什麼是最大值和最小值及其相應的年份並打印出來。

我想這樣做的方式可能是爲每一行創建一個集合,檢查「所有GE變種」是否在集合中,並且如果它是將它添加到字典中。然後做一些類似的作物?

我最大的困惑可能是1)我不知道如何去忽略不包含「所有GE品種」的品種。在創建字典之前或之後,我會這樣做嗎?和2.)我知道如何創建一個具有一個值和一個關鍵字的字典,但我怎樣才能將其餘的值添加到關鍵字?你有套嗎?或列表?

+1

什麼是他們的關鍵,什麼是價值? –

+0

您可以使用標準庫中的'csv'模塊。 –

+0

@DmitryPolonskiy該鍵應該是州名,該值應該是作物名稱,品種,年份和價值。 –

回答

0

如前所述,您可以使用csv模塊讀取csv文件。我並不確定你想要如何在state鍵之後構建數據,但我認爲能夠查找每個特定的crop_title然後能夠分別訪問每年的value可能更好。

In[33]: from collections import defaultdict 
    ...: from csv import reader 
    ...: 
    ...: crops = defaultdict(lambda: defaultdict(dict)) 
    ...: with open('hmm.csv', 'r') as csvfile: 
    ...:  cropreader = reader(csvfile) 
    ...:  for row in cropreader: 
    ...:   state, crop_type, crop_title, variety, year, unit, value = row 
    ...:   if variety == 'All GE varieties': 
    ...:    crops[state][crop_title][year] = value 
    ...: 
In[34]: crops 
Out[34]: 
defaultdict(<function __main__.<lambda>>, 
      {'Indiana': defaultdict(dict, 
         {'Genetically engineered (GE) corn': {'2000': '11', 
          '2001': '12', 
          '2002': '13', 
          '2003': '16', 
          '2004': '21', 
          '2005': '26', 
          '2006': '40', 
          '2007': '59', 
          '2008': '78', 
          '2009': '79', 
          '2010': '83', 
          '2011': '85', 
          '2012': '84', 
          '2013': '85', 
          '2014': '88', 
          '2015': '88', 
          '2016': '86'}})}) 
In[35]: crops['Indiana']['Genetically engineered (GE) corn']['2000'] 
Out[35]: '11' 
In[36]: crops['Indiana']['Genetically engineered (GE) corn']['2015'] 
Out[36]: '88' 

你也可以轉換yearvalue成整數這樣crops[state][crop_title][int(year)] = int(value)這將讓你做出這樣的電話(其中的返回值是一個整數):

In[38]: crops['Indiana']['Genetically engineered (GE) corn'][2015] 
Out[38]: 88 
0

搞清楚,如果「全基因品種」是字符串中相對比較簡單 - 使用在關鍵字:

對於數據結構,我偏愛字典的列表,其中每個詞典有定義的一組按鍵:

myList = [ {}, {}, {}, ... ] 

在這種情況下,問題是我不知道你會爲重點幹什麼用的,如果每個字段的值。還記得分裂()命令可以幫助:

varieties = [] 
with open(datafile, 'r') as infile: 
    for line in file: 
     if "All GE varieties" in line: 
      varieties.append(line.split(',')) 

這會給你列出包含列表(品種),其中的每一個從每行一個字段。

事情是這樣的:

varieties = [['Indiana','Corn','Genetically engineered (GE) corn','All GE varieties','2000','Percent of all corn planted','11'], ['Indiana','Corn','Genetically engineered (GE) corn','All GE varieties','2001','Percent of all corn planted','12'], ... ] 

從這裏將是相當容易挑出使用切片狀態或年等(二維數組)。

0

我把你的數據轉換成一個名爲「crop_data.csv」的文件。以下是一些使用標準csv模塊將每行讀入自己的字典的代碼。我們使用簡單的if測試來確保我們只保留'Variety' == 'All GE varieties'的行,並且我們將每個狀態的數據存儲在all_data中,這是一個列表字典,每個狀態一個列表。由於國家'Name'被用作all_data中的密鑰,因此我們不需要將它保留在row字典中,同樣我們也可以放棄'Variety',因爲我們不再需要這些信息。

收集所有數據後,我們可以使用json模塊很好地打印它。

然後我們循環遍歷all_data,狀態按狀態計算其最大值和最小值。

import csv 
from collections import defaultdict 
import json 

filename = 'crop_data.csv' 

fieldnames = 'Name,Crop,Title,Variety,Year,Unit,Value'.split(',') 

all_data = defaultdict(list) 

with open(filename) as csvfile: 
    reader = csv.DictReader(csvfile, fieldnames=fieldnames) 
    for row in reader: 
     # We only want 'All GE varieties' 
     if row['Variety'] == 'All GE varieties': 
      state = row['Name'] 
      # Get rid of unneeded fields 
      del row['Name'], row['Variety'] 
      # Store it as a plain dict 
      all_data[state].append(dict(row)) 

# Show all the data 
print(json.dumps(all_data, indent=4)) 

#Find minimums & maximums 

# Extract the 'Value' field from dict d and convert it to a number 
def value_key(d): 
    return int(d['Value']) 

for state, data in all_data.items(): 
    print(state) 
    row = min(data, key=value_key) 
    print('min', row['Value'], row['Year']) 

    row = max(data, key=value_key) 
    print('max', row['Value'], row['Year']) 

輸出

{ 
    "Indiana": [ 
     { 
      "Crop": "Corn", 
      "Title": "Genetically engineered (GE) corn", 
      "Year": "2000", 
      "Unit": "Percent of all corn planted", 
      "Value": "11" 
     }, 
     { 
      "Crop": "Corn", 
      "Title": "Genetically engineered (GE) corn", 
      "Year": "2001", 
      "Unit": "Percent of all corn planted", 
      "Value": "12" 
     }, 
     { 
      "Crop": "Corn", 
      "Title": "Genetically engineered (GE) corn", 
      "Year": "2002", 
      "Unit": "Percent of all corn planted", 
      "Value": "13" 
     }, 
     { 
      "Crop": "Corn", 
      "Title": "Genetically engineered (GE) corn", 
      "Year": "2003", 
      "Unit": "Percent of all corn planted", 
      "Value": "16" 
     }, 
     { 
      "Crop": "Corn", 
      "Title": "Genetically engineered (GE) corn", 
      "Year": "2004", 
      "Unit": "Percent of all corn planted", 
      "Value": "21" 
     }, 
     { 
      "Crop": "Corn", 
      "Title": "Genetically engineered (GE) corn", 
      "Year": "2005", 
      "Unit": "Percent of all corn planted", 
      "Value": "26" 
     }, 
     { 
      "Crop": "Corn", 
      "Title": "Genetically engineered (GE) corn", 
      "Year": "2006", 
      "Unit": "Percent of all corn planted", 
      "Value": "40" 
     }, 
     { 
      "Crop": "Corn", 
      "Title": "Genetically engineered (GE) corn", 
      "Year": "2007", 
      "Unit": "Percent of all corn planted", 
      "Value": "59" 
     }, 
     { 
      "Crop": "Corn", 
      "Title": "Genetically engineered (GE) corn", 
      "Year": "2008", 
      "Unit": "Percent of all corn planted", 
      "Value": "78" 
     }, 
     { 
      "Crop": "Corn", 
      "Title": "Genetically engineered (GE) corn", 
      "Year": "2009", 
      "Unit": "Percent of all corn planted", 
      "Value": "79" 
     }, 
     { 
      "Crop": "Corn", 
      "Title": "Genetically engineered (GE) corn", 
      "Year": "2010", 
      "Unit": "Percent of all corn planted", 
      "Value": "83" 
     }, 
     { 
      "Crop": "Corn", 
      "Title": "Genetically engineered (GE) corn", 
      "Year": "2011", 
      "Unit": "Percent of all corn planted", 
      "Value": "85" 
     }, 
     { 
      "Crop": "Corn", 
      "Title": "Genetically engineered (GE) corn", 
      "Year": "2012", 
      "Unit": "Percent of all corn planted", 
      "Value": "84" 
     }, 
     { 
      "Crop": "Corn", 
      "Title": "Genetically engineered (GE) corn", 
      "Year": "2013", 
      "Unit": "Percent of all corn planted", 
      "Value": "85" 
     }, 
     { 
      "Crop": "Corn", 
      "Title": "Genetically engineered (GE) corn", 
      "Year": "2014", 
      "Unit": "Percent of all corn planted", 
      "Value": "88" 
     }, 
     { 
      "Crop": "Corn", 
      "Title": "Genetically engineered (GE) corn", 
      "Year": "2015", 
      "Unit": "Percent of all corn planted", 
      "Value": "88" 
     }, 
     { 
      "Crop": "Corn", 
      "Title": "Genetically engineered (GE) corn", 
      "Year": "2016", 
      "Unit": "Percent of all corn planted", 
      "Value": "86" 
     } 
    ] 
} 
Indiana 
min 11 2000 
max 88 2014 

注意,在這個數據有2年的88.你的價值,如果你想可以使用比value_key票友鍵功能每年打破關係。或者您可以使用value_key對整個狀態data列表進行排序,以便您可以輕鬆提取所有最低和最高記錄。例如,在for state, data循環做

data.sort(key=value_key) 
print(json.dumps(data, indent=4)) 

,它將打印的數字順序該州的所有記錄。

+0

我想這取決於OP想要什麼,但這似乎是很多重複。每一個內部字典唯一改變的是'年/值'密鑰對。 –

+0

@DeliriousLettuce確實如此,但只是在給出的樣本數據中,真實數據可能會有更多變化。但是如果他們想放棄其他一些領域,我已經展示瞭如何做到這一點。 –