2016-11-18 129 views
0

我從我的文本文件中找到以下示例數據(僅僅是一部分)。我試圖提取三個密鑰,包括timestamp,dataFrame和rssi到csv文件中。將文本文件解析爲csv

packet"{\"test\":{\"id\":1479238177559,\"deveui\":\"0000000033035032\",\"timestamp\":\"2016-11-15T19:29:37.559Z\",\"dataFrame\":\"ABzuPdVNxrSEAV8=\",\"fcnt\":81,\"port\":5,\"rssi\":6,\"snr\":9.5,\"sf_used\":10,\"cr_used\":\"4/5\",\"device_redundancy\":0,\"time_on_air_ms\":288.76800000000003,\"decrypted\":true}}" 
Received message in at 2016-11-15 14:29:43.611000 
packet"{\"test\":{\"id\":1479238184069,\"deveui\":\"0000000033035032\",\"timestamp\":\"2016-11-15T19:29:44.069Z\",\"dataFrame\":\"ABzuPdVNxrSEAV8=\",\"fcnt\":82,\"port\":5,\"rssi\":6,\"snr\":8.5,\"sf_used\":10,\"cr_used\":\"4/5\",\"device_redundancy\":0,\"time_on_air_ms\":288.76800000000003,\"decrypted\":true}}" 
Received message in at 2016-11-15 14:29:49.225000 
packet"{\"test\":{\"id\":1479238189685,\"deveui\":\"0000000033035032\",\"timestamp\":\"2016-11-15T19:29:49.685Z\",\"dataFrame\":\"ABzuPdVNxrSEAV8=\",\"fcnt\":83,\"port\":5,\"rssi\":7,\"snr\":9.5,\"sf_used\":10,\"cr_used\":\"4/5\",\"device_redundancy\":0,\"time_on_air_ms\":288.76800000000003,\"decrypted\":true}}" 
Received message in at 2016-11-15 14:29:56.410000 
packet"{\"testl\":{\"id\":1479238196868,\"deveui\":\"0000000033035032\",\"timestamp\":\"2016-11-15T19:29:56.868Z\",\"dataFrame\":\"ABzuPdVNxrSEAV8=\",\"fcnt\":84,\"port\":5,\"rssi\":3,\"snr\":9.8,\"sf_used\":10,\"cr_used\":\"4/5\",\"device_redundancy\":0,\"time_on_air_ms\":288.76800000000003,\"decrypted\":true}}" 
+1

您是否可以用您的文本文件的確切內容更新您的問題? (看起來你正在解析一個JSON,但語法不正確)你到底想要達到什麼目的?從一個json文件中獲取相同的數據並將其轉儲到csv中? – mabe02

+0

我的文本文件的確切內容是很大的。這與我上面顯示的數據完全相同。它重複像數據包「{.......}」數據包「{...}」 – James

+0

刪除您的代碼不是讓人們提供幫助的好方法,他們會希望看到您嘗試解決問題你自己,不要求我們爲你做。 –

回答

1

這顯然是數據已經被意外JSON編碼的兩倍,因此它可以被解碼兩次得到一個不錯的詞典:

import json 

with open('log.txt') as infile: 
    packet = [] 
    for line in infile: 
     if line.startswith('packet"{'): 
      # Remove 'packet' prefix 
      line = line[len('packet'):] 
      packet = json.loads(json.loads(line)) 
      print('Packet:') 
      print(packet) 
      packet = packet.values()[0] 
      print('Values:') 
      print(packet['timestamp'], packet['dataFrame'], packet['rssi']) 

輸出:

Packet: 
{u'test': {u'decrypted': True, u'fcnt': 81, u'timestamp': u'2016-11-15T19:29:37.559Z', u'dataFrame': u'ABzuPdVNxrSEAV8=', u'id': 1479238177559, u'sf_used': 10, u'snr': 9.5, u'cr_used': u'4/5', u'deveui': u'0000000033035032', u'device_redundancy': 0, u'rssi': 6, u'port': 5, u'time_on_air_ms': 288.76800000000003}} 
Values: 
(u'2016-11-15T19:29:37.559Z', u'ABzuPdVNxrSEAV8=', 6) 
Packet: 
{u'test': {u'decrypted': True, u'fcnt': 82, u'timestamp': u'2016-11-15T19:29:44.069Z', u'dataFrame': u'ABzuPdVNxrSEAV8=', u'id': 1479238184069, u'sf_used': 10, u'snr': 8.5, u'cr_used': u'4/5', u'deveui': u'0000000033035032', u'device_redundancy': 0, u'rssi': 6, u'port': 5, u'time_on_air_ms': 288.76800000000003}} 
Values: 
(u'2016-11-15T19:29:44.069Z', u'ABzuPdVNxrSEAV8=', 6) 
Packet: 
{u'test': {u'decrypted': True, u'fcnt': 83, u'timestamp': u'2016-11-15T19:29:49.685Z', u'dataFrame': u'ABzuPdVNxrSEAV8=', u'id': 1479238189685, u'sf_used': 10, u'snr': 9.5, u'cr_used': u'4/5', u'deveui': u'0000000033035032', u'device_redundancy': 0, u'rssi': 7, u'port': 5, u'time_on_air_ms': 288.76800000000003}} 
Values: 
(u'2016-11-15T19:29:49.685Z', u'ABzuPdVNxrSEAV8=', 7) 
Packet: 
{u'testl': {u'decrypted': True, u'fcnt': 84, u'timestamp': u'2016-11-15T19:29:56.868Z', u'dataFrame': u'ABzuPdVNxrSEAV8=', u'id': 1479238196868, u'sf_used': 10, u'snr': 9.8, u'cr_used': u'4/5', u'deveui': u'0000000033035032', u'device_redundancy': 0, u'rssi': 3, u'port': 5, u'time_on_air_ms': 288.76800000000003}} 
Values: 
(u'2016-11-15T19:29:56.868Z', u'ABzuPdVNxrSEAV8=', 3) 
0

你可以試試這種方法:

import pandas as pd 

content = [] 
for c in open('log.txt').readlines(): 
    if c.startswith('packet"{'): 
     content.append(c[7:-2].decode('string_escape')) 

df = pd.concat([pd.read_json(line, orient='index') for line in content]) 

df[['dataFrame', 'rssi', 'timestamp']].to_csv('out.csv', index=False, header=None) 

out.csv文件中的數據爲:

ABzuPdVNxrSEAV8=,6,2016-11-15 19:29:37.559 
ABzuPdVNxrSEAV8=,6,2016-11-15 19:29:44.069 
ABzuPdVNxrSEAV8=,7,2016-11-15 19:29:49.685 
ABzuPdVNxrSEAV8=,3,2016-11-15 19:29:56.868