標籤劃定的python 3 .txt文件閱讀

我在分配任務時開始有點麻煩。我們發佈了包含6列數據和約50行這些數據的標籤劃定的.txt文件。我需要幫助開始一個列表來存儲這些數據以供以後調用。最終，我需要能夠列出任何特定列的所有內容並對其進行分類，計數等。任何幫助將不勝感激。標籤劃定的python 3 .txt文件閱讀

編輯;除了研究這些東西之外，我真的沒有做太多的工作，我知道不適合看csv，以前我做過單列.txt文件，但我不知道如何解決這個問題。我將如何給單獨的列提供名稱？當一行結束並且下一行開始時，我將如何告訴程序？

來源

2012-03-29 James May

看看了' csv'模塊 – Dikei 2012-03-29 03:38:34

聽起來像一個更適合數據庫的工作。您應該只使用PostgreSQLs COPY FROM操作將CSV數據導入表中，然後使用python + SQL處理所有排序，搜索和匹配需求。

如果您覺得真正的數據庫過度殺傷，那麼仍然有像SQLlite和BerkleyDB這樣的選項，它們都有python模塊。

編輯：BerkelyDB已棄用，但anydbm與概念相似。

來源

2012-03-29 03:41:33 SpliFF

是的，我可以很容易地在ArcGIS中做到這一點，但是我需要使用Python 3來完成這個任何想法？ – 2012-03-29 03:42:59

你仍然可以使用python來控制查詢。如果你的意思是隻使用python，那麼你只需要在python中實現一個數據庫，這對於時間和資源來說並不是很有效率的使用。你對berkleydb/sqlite選項有什麼看法？ – SpliFF 2012-03-29 03:45:52

啊，它必須在Python 3中。我知道有很多更有效的方法來做這種事情哈哈。但遺憾的是，它必須與python3 ...：/ – 2012-03-29 03:50:34

Pandas中的數據幀結構基本上完全是你想要的。如果你對此熟悉的話，它與R中的數據框很相似。它內置了子集化，排序和其他操作表格數據的選項。

它讀取directly from csv甚至自動讀取列名稱。你會打電話：

作品在Python 3

來源

2012-03-29 03:58:08

我的數據沒有頭文件int .txt文件，我可以爲6列數據創建它們嗎？ – 2012-03-29 05:35:34

是的，在這種情況下，您只需調用'read_csv（yourfilename，sep ='\ t'，names = ['header1'，'header2'，...]）''。文檔在這裏：http://pandas.sourceforge.net/generated/pandas.io.parsers.read_csv.html – 2012-03-29 17:25:43

比方說，你有類似下面的CSV。

1  2  3  4  5  6 
1  2  3  4  5  6 
1  2  3  4  5  6 
1  2  3  4  5  6 
1  2  3  4  5  6

您可以閱讀到一本字典，像這樣：

>>> import csv 
>>> reader = csv.DictReader(open('test.csv','r'), fieldnames= ['col1', 'col2', 'col3', 'col4', 'col5', 'col6'], dialect='excel-tab') 
>>> for row in reader: 
...  print row  
{'col6': '6', 'col4': '4', 'col5': '5', 'col2': '2', 'col3': '3', 'col1': '1'} 
{'col6': '6', 'col4': '4', 'col5': '5', 'col2': '2', 'col3': '3', 'col1': '1'} 
{'col6': '6', 'col4': '4', 'col5': '5', 'col2': '2', 'col3': '3', 'col1': '1'} 
{'col6': '6', 'col4': '4', 'col5': '5', 'col2': '2', 'col3': '3', 'col1': '1'} 
{'col6': '6', 'col4': '4', 'col5': '5', 'col2': '2', 'col3': '3', 'col1': '1'}

但熊貓庫可能更適合這個。 http://pandas.pydata.org/pandas-docs/stable/io.html#csv-text-files

來源

2012-03-29 04:24:41 Amjith

謝謝你，這真的有幫助！但在我的數據中沒有標題，有沒有辦法在不更改原始.txt文件的情況下將標題分配給列？ – 2012-03-29 05:16:50

是的，DictReader有一個可選的參數可以用來傳遞字段名。我已經編輯了答案來反映這一點。 – Amjith 2012-03-29 13:19:18

我想用分貝爲50線和6個colums是矯枉過正，所以這裏是我的想法：

from __future__ import print_function 
import os 
from operator import itemgetter 


def get_records_from_file(path_to_file): 
    """ 
    Read a tab-deliminated file and return a 
    list of dictionaries representing the data. 
    """ 
    records = [] 
    with open(path_to_file, 'r') as f: 
     # Use the first line to get names for columns 
     fields = [e.lower() for e in f.readline().split('\t')] 

     # Iterate over the rest of the lines and store records 
     for line in f: 
      record = {} 
      for i, field in enumerate(line.split('\t')): 
       record[fields[i]] = field 
      records.append(record) 

    return records 


if __name__ == '__main__': 
    path = os.path.join(os.getcwd(), 'so.txt') 
    records = get_records_from_file(path) 

    print('Number of records: {0}'.format(len(records))) 

    s = sorted(records, key=itemgetter('id')) 
    print('Sorted: {0}'.format(s))

用於存儲以後使用的記錄，看看Python的pickle library --that'll讓你將它們保存爲Python對象。

此外，請注意我沒有Python 3中我現在使用的計算機上安裝這樣做，但我敢肯定，這將會對Python的2或3。工作

來源

2012-03-29 04:26:01 modocache

標籤劃定的python 3 .txt文件閱讀

回答

相關問題