2015-10-13 61 views
4

我有一個名爲「triple_response.txt」包含了一些文本爲文本文件:使用python腳本迭代一個文本文件的蟒蛇內容

(1,(db_name,string),DSP) 
(1,(rel, id),2) 
(2,(rel_name, string),DataSource) 
(2,(tuple, id),201) 
(2,(tuple, id),202) 
(2,(tuple, id),203) 
(201,(src_id,varchar),Pos201510070) 
(201,(src_name,varchar),Postgres) 
(201,(password,varchar),root) 
(201,(host,varchar),localhost) 
(201,(created_date,date),2015-10-07) 
(201,(user_name,varchar),postgres) 
(201,(src_type,varchar),Structured) 
(201,(db_name,varchar),postgres) 
(201,(port,numeric),None) 
(202,(src_id,varchar),pos201510060) 
(202,(src_name,varchar),Postgres) 
(202,(password,varchar),root) 
(202,(host,varchar),localhost) 
(202,(created_date,date),2015-10-06) 
(202,(user_name,varchar),postgres) 
(202,(src_type,varchar),Structured) 
(202,(db_name,varchar),DSP) 
(202,(port,numeric),5432) 
(203,(src_id,varchar),pos201510060) 
(203,(src_name,varchar),Postgres) 
(203,(password,varchar),root) 
(203,(host,varchar),localhost) 
(203,(created_date,date),2015-10-06) 
(203,(user_name,varchar),postgres) 
(203,(src_type,varchar),Structured) 
(203,(db_name,varchar),maindb) 
(203,(port,numeric),5432) 

我想這些內容轉換成JSON :

import re 
import collections 
import json, jsonpickle 


def convertToJSON(File): 
    word_list=[] 
    row_list = [] 
    try: 
     with open(File,'r') as f: 
      for word in f: 
       word_list.append(word) 


     with open(File,'r+') as f: 
      for row in f: 
       print row 
       row_list.append(row.split()) 

     column_list = zip(*row_list) 
    except IOError: 
     print "Error in opening file.." 
    triple ="" 
    for t in word_list: 
     triple+=t 

    tripleList = re.findall(r"\([^\(^\)]*\)",triple) 
    idList = re.split(r"\([^\(^\)]*\)",triple) 

    i =0 
    jsonDummy = [] 
    jsonData = {} 
    for trip in tripleList: 
     nameAndType = re.split(r",|:",trip) 

     if(i==0): 
       key = re.compile("[^\w']|_").sub("",idList[i]) 
     else: 
      try: 
       key = re.compile("[^\w']|_").sub("",idList[i].split("(")[1]) 
      except IndexError: 
       pass 
     i = i+1 
     if(idList[i].find('(')!=-1): 
      try: 
       content = re.compile("[^\w']|_").sub("",idList[i].split(")")[0]) 

      except IndexError: 
       pass 
     else: 
      content = re.compile("[^\w']|_").sub("",idList[i]) 
     try: 
      trip = trip[1:-1] 
      tripKey = trip[1] 

     except IndexError: 
      tripKey = '' 
     name = re.compile("[^\w']").sub("",nameAndType[0]) 
     try: 
      typeName = re.compile("[^\w']|_").sub("",nameAndType[1]) 
     except IndexError: 
      typeName = 'String' 

     tripDict = dict() 
     value = dict() 

     value[name] = content 
     tripDict[key]=value 

     jsonDummy.append(tripDict) 

    for j in jsonDummy: 
     for k,v in j.iteritems(): 
      jsonData.setdefault(k, []).append(v) 

    data = dict() 
    data['data'] = jsonData 
    obj = {} 
    obj=jsonpickle.encode(data, unpicklable=False) 

    return obj 

    pass 

我在同一文件中調用這個函數convertToJSON()爲:

打印convertToJSON( 「triple_response.txt」)

我得到的輸出如我所料,如:

{"data": {"1": [{"db_name": "DSP"}, {"rel": "2"}], "201": [{"src_id": "Pos201510070"}, {"src_name": "Postgres"}, {"password": "root"}, {"host": "localhost"}, {"created_date": "20151007"}, {"user_name": "postgres"}, {"src_type": "Structured"}, {"db_name": "postgres"}, {"port": "None"}], "203": [{"src_id": "pos201510060"}, {"src_name": "Postgres"}, {"password": "root"}, {"host": "localhost"}, {"created_date": "20151006"}, {"user_name": "postgres"}, {"src_type": "Structured"}, {"db_name": "maindb"}, {"port": "5432"}], "2": [{"rel_name": "DataSource"}, {"tuple": "201"}, {"tuple": "202"}, {"tuple": "203"}], "202": [{"src_id": "pos201510060"}, {"src_name": "Postgres"}, {"password": "root"}, {"host": "localhost"}, {"created_date": "20151006"}, {"user_name": "postgres"}, {"src_type": "Structured"}, {"db_name": "DSP"}, {"port": "5432"}]}} 

現在,這是我現在所面臨的問題,我是從類的外部調用此爲:

def extractConvertData(self): 
     triple_response = SPO(source, db_name, table_name, response) 
     try: 
      _triple_file = open('triple_response.txt','w+') 
      _triple_file.write(triple_response) 
      print "written data in file.." 
      with open('triple_response.txt','r+') as f: 
       for word in f: 
        print word 
      jsonData = convertToJSON(str('triple_response.txt')) 
     except IOError: 
      print "Not able to open a file" 
     print "Converted into JSON" 
     print jsonData 
     pass 

相同的代碼convertToJSON()不起作用...

既沒有給出任何輸出也沒有給出任何錯誤,它無法讀取行中'triple_response.txt'文件的內容。

with open('triple_response.txt','r+') as f: 
    for word in f: 
     print word 

任何人能告訴我解決這個問題..

+4

「從類的外部調用這個?」我沒有看到任何類的定義。 –

+2

包含'extractConvertData'的腳本與'triple_response.txt'是否存在於同一個目錄中? –

+2

你的文件找不到,因爲你使用相對路徑來解決它 - 這是我相對於絕對路徑的標準答案:http://stackoverflow.com/questions/30621233/python-configparser-cannot-search-ini-file -correctly-ubuntu-14-python-3-4/30625670#30625670。 –

回答

2

_triple_file永遠不會關閉(除了隱含當您結束Python的過程,這是一個可怕的做法)。

當你像這樣懸掛文件句柄時,你可以獲得特定於平臺的行爲(什麼是你的平臺?Unix?Windows?)。可能寫入_triple_file不會被刷新。 所以不要讓它搖晃。確保在寫入後關閉它:(_triple_file.write(triple_response))。事實上,然後斷言文件長度不爲零,使用os.stat(),否則引發異常。

此外,你只有一個大的嘗試...除了子句捕捉所有的錯誤,這是一口咬了太多。將它分成兩個單獨的嘗試...除了編寫_triple_file的條款,然後再讀回。 (順便說一句,你可能喜歡使用tempfile庫,以避開需要知道你的中間文件的路徑名)。

類似以下內容未經測試的僞代碼:

triple_response = SPO(source, db_name, table_name, response) 
    try: 
     _triple_file = open('triple_response.txt','w+') 
     _triple_file.write(triple_response) 
     _triple_file.close() 
    except IOError: 
     print "Not able to write intermediate JSON file" 
     raise 

    assert [suitable expression involving os.stat('triple_response.txt') to test size > 0 ], "Error: intermediate JSON file was empty" 

    try: 
     with open('triple_response.txt','r+') as f: 
      for word in f: 
       print word 
     jsonData = convertToJSON(str('triple_response.txt')) 
    except IOError: 
     print "Not able to read back intermediate JSON file" 
     #raise # if you want to reraise the exception 

    ... 
+1

非常感謝smci ..... –