如何分析蟒蛇輸入文件

我有如製表符分隔值的文件：如何分析蟒蛇輸入文件

"1" "12345" "abc" "def" 
"2" "67890" "abc" "ghi" 
"3" "13578" "jkl" "mno"

我不能，如果前5位的比賽弄清楚如何從輸入文件取任意數字和，輸入文件的第二列中有什麼，那麼該行上的所有內容都將導出到另一個文件中。

例如：輸入文件：「67890123」

output file: "2" "67890" "abc" "ghi"

來源

2017-09-14 Shawn Sharp

到目前爲止你的代碼是什麼，它在哪裏不起作用。 –

第一個文件有多大？我建議將每行讀入一個字典，其中的5位數字是關鍵字：'{「12345」：（「1」，「12345」，「abc」，「def」），「67890」：（「2」，「67890」，...）...}'然後用輸入的前5位數字簡單地索引字典。 – Aaron

這是一個基本的問題，python和你可以很容易地處理。通過線讀取文件線分割線，創建列表如果條件真： ----所需的數據存儲到數據結構當讀完，數據結構寫入一個文件這個例子可以幫助： http://interactivepython.org/runestone/static/thinkcspy/Files/Iteratingoverlinesinafile.html – diek

您可以使用pandas包來讀取和寫入您的數據文件。

from __future__ import with_statement 
import pandas as pd 

inputFileName = "D:/tmp/inputfile.txt" 
dataFileName = "D:/tmp/data.csv" 
outputFileName = "D:/tmp/outputfile.txt" 

data = pd.read_csv(dataFileName, sep=' ', header=None) 

with open(inputFileName) as f: 
    input = f.readlines() 
input = [int(x[0:5]) for x in input] 

output = pd.DataFrame() 
for value in input: 
    output = output.append(data[data[data.columns[1]] == value]) 

output.to_csv(outputFileName, sep=' ', header=None, index=False)

因此，如果您輸入的文件有

6789
13578010

，數據仍然是

"1" "12345" "abc" "def" 
"2" "67890" "abc" "ghi" 
"3" "13578" "jkl" "mno"

輸出文件將是：

2 67890 abc ghi 
3 13578 jkl mno

來源

2017-09-14 20:23:31

我喜歡它...... – johnashu

import argparse 

parser = argparse.ArgumentParser() 
parser.add_argument('-i', '--input', required = True) 
args = parser.parse_args() 

with open('input.txt') as file: 
    entries = file.readlines() 
    ## Do not remove new line character at end as it will be useful to print new lines. 

with open('output.txt', 'w') as file: 
    for entry in entries: 
     components = entry.split('\t') 
     if components[1][1: 6] == args.input[:5]: 
      # Note indexing of slicing starts from 1 to 6. Reason for that is there is 
      # explicit quote symbol present in input. 
      file.write(entry)

若要運行此代碼： > python my_file.py --input='67890'

代碼是自我解釋，讓我知道，如果你需要更多的解釋。

來源

2017-09-14 20:03:26 user1190882

E2A：多個輸入..

假設你已經從一個TSV文件加載輸入

您可以使用簡單的布爾比較

簡單的Python的方法是：

import csv 
input = ['67890231', '12345065'] 


with open("so.tsv") as tsv: 
    for line in csv.reader(tsv, dialect="excel-tab"): 
     for item in line: 
      match = [line for x in line if x == item[:5]] 

     print(match)

退貨：

[['1', '12345', 'abc']] 
[['2', '67890', 'def']]

來源

2017-09-14 20:13:55 johnashu

試試這個：

import os, re 
import argparse as ap 

p = ap.ArgumentParser() 
p.add_argument('-i', '--input', required = True) 
args = p.parse_args() 

with open('file.txt', 'r') as f: 
    for value in f.read().split('\n'): 
     if str(re.split(r'\s+',value)[1]).replace('"', '') == args.input[:5]: 
      open('output.txt', 'w').write(value)

來源

2017-09-14 20:24:56 Abe

如何分析蟒蛇輸入文件

回答

相關問題