我想提取字段,並從文件中所有字段頻繁更改順序的行中進行下一個匹配。例如,在這些線路,提取多個特定字段及其下一個字段
123_abc/previous-2016-04-16-022850/smpp_411_0.log.1.tar.gz:2016-04-14 18:11:46+0100 [WorkerAMQClient,client] Processed outbound message for 123_abc_411: {"transport_name": "123_abc_411", "from_addr_type": null, "group": null, "from_addr": "*123#", "timestamp": "2016-04-14 17:11:46.348000", "helper_metadata": {}, "to_addr": "0007031975326", "to_addr_type": null, "session_id": "1861570762", "transport_metadata": {"abc_mongolia_smpp": {"session_id": "1861570762", "starCode": "123", "requestId": "1534318080", "phase": "2", "clientId": "441", "dcs": "15"}}, "content": "Communication error\n\n0>Back", "session_event": "resume", "routing_metadata": {}, "message_version": "20110921", "transport_type": "smpp", "provider": "abc_mongolia", "in_reply_to": "b613a5fc5c0c4b1b8e1108bb8bd7b946", "message_type": "user_message", "message_id": "1206f738-f0d2-4a8e-9beb-2efeaada77d9"}
123_abc/previous-2016-04-16-022850/smpp_411_0.log.1.tar.gz:2016-04-14 18:11:46+0100 [WorkerAMQClient,client] Working with: <Message payload="{'transport_name': u'123_abc_411', 'from_addr_type': None, 'group': None, 'from_addr': u'*123#', 'timestamp': datetime.datetime(2016, 4, 14, 17, 11, 46, 348000), 'helper_metadata': {}, 'to_addr': u'0007031975326', 'to_addr_type': None, 'session_id': u'1861570762', 'transport_metadata': {u'abc_mongolia_smpp': {u'session_id': u'1861570762', u'starCode': u'123', u'requestId': u'1534318080', u'phase': u'2', u'clientId': u'441', u'dcs': u'15'}}, 'content': u'Communication error\n\n0>Back', 'session_event': u'resume', 'routing_metadata': {}, 'message_version': u'20110921', 'transport_type': u'smpp', 'provider': u'abc_mongolia', 'in_reply_to': u'b613a5fc5c0c4b1b8e1108bb8bd7b946', 'message_type': u'user_message', 'message_id': u'1206f738-f0d2-4a8e-9beb-2efeaada77d9'}">
123_abc/previous-2016-04-16-022850/smpp_411_0.log.1.tar.gz:2016-04-14 18:11:46+0100 [abracRedis,client] abcmongoliasmppTransport sending outbound message: <Message payload="{'transport_name': u'123_abc_411', 'from_addr_type': None, 'group': None, 'from_addr': u'*123#', 'timestamp': datetime.datetime(2016, 4, 14, 17, 11, 46, 348000), 'helper_metadata': {}, 'to_addr': u'0007031975326', 'to_addr_type': None, 'session_id': u'1861570762', 'transport_metadata': {u'abc_mongolia_smpp': {u'session_id': u'1861570762', u'starCode': u'123', u'requestId': u'1534318080', u'phase': u'2', u'clientId': u'441', u'dcs': u'15'}}, 'content': u'Communication error\n\n0>Back', 'session_event': u'resume', 'routing_metadata': {}, 'message_version': u'20110921', 'transport_type': u'smpp', 'provider': u'abc_mongolia', 'in_reply_to': u'b613a5fc5c0c4b1b8e1108bb8bd7b946', 'message_type': u'user_message', 'message_id': u'1206f738-f0d2-4a8e-9beb-2efeaada77d9'}">
123_dfg/smpp_37_0.log:2016-04-16 16:24:59+0100 [abracRedis,client] OUTGOING >> {'body': {'mandatory_parameters': {'priority_flag': 0, 'dest_addr_npi': 1, 'source_addr': '123', 'protocol_id': 0, 'replace_if_present_flag': 0, 'registered_delivery': True, 'dest_addr_ton': 1, 'source_addr_npi': 0, 'schedule_delivery_time': '', 'sm_default_msg_id': 0, 'sm_length': 0, 'esm_class': 0, 'data_coding': 0, 'service_type': '', 'source_addr_ton': 0, 'validity_period': '', 'destination_addr': '0007084023687', 'short_message': 'Communication error\n\n0>Back'}, 'optional_parameters': [{'length': 0, 'tag': 'smpp_service_op', 'value': '02'}, {'length': 0, 'tag': 'its_session_info', 'value': '3522'}]}, 'header': {'command_status': 'ESME_ROK', 'command_length': 0, 'sequence_number': 835674, 'command_id': 'submit_sm'}}
123_dfg/smpp_37_0.log:2016-04-16 17:02:40+0100 [WorkerAMQClient,client] Processed outbound message for 123_dfg_37: {"transport_name": "123_dfg_37", "from_addr_type": null, "group": null, "from_addr": "123", "timestamp": "2016-04-16 16:02:40.832000", "helper_metadata": {}, "to_addr": "0008081741472", "to_addr_type": null, "session_id": "d9dac229dec5499286890fa1c81aa16a", "transport_metadata": {"session_info": "5070"}, "content": "Communication error\n\n0>Back", "session_event": "resume", "routing_metadata": {}, "message_version": "20110921", "transport_type": "smpp", "provider": "123_dfg_37", "in_reply_to": "fd3e29028fb04f089fe764ba94d7d9af", "message_type": "user_message", "message_id": "b33737d5-39a7-478b-9680-31c54c15e3b2"}
123_dfg/smpp_37_0.log:2016-04-16 17:02:40+0100 [abracRedis,client] OUTGOING >> {'body': {'mandatory_parameters': {'priority_flag': 0, 'dest_addr_npi': 1, 'source_addr': '123', 'protocol_id': 0, 'replace_if_present_flag': 0, 'registered_delivery': True, 'dest_addr_ton': 1, 'source_addr_npi': 0, 'schedule_delivery_time': '', 'sm_default_msg_id': 0, 'sm_length': 0, 'esm_class': 0, 'data_coding': 0, 'service_type': '', 'source_addr_ton': 0, 'validity_period': '', 'destination_addr': '0008081741472', 'short_message': 'Communication error\n\n0>Back'}, 'optional_parameters': [{'length': 0, 'tag': 'smpp_service_op', 'value': '02'}, {'length': 0, 'tag': 'its_session_info', 'value': '5070'}]}, 'header': {'command_status': 'ESME_ROK', 'command_length': 0, 'sequence_number': 835793, 'command_id': 'submit_sm'}}
我需要能夠提取這些字段:
'to_addr': u'0007031975326' (or 000
'timestamp': datetime.datetime(2016, 4, 14, 17, 11, 46, 348000),
'content': u'Communication error\n\n0>Back'
,並給他們在一條線上的原始文件排列,每行。
我一直在嘗試這種AWK線的不同變化:
awk '{ for(i=1;i<=NF;i++) if ($i == "'content:'") print $(i+1) }' testfile.txt
但我真的來了短。請有人告訴我如何在awk和sed中做到這一點?
謝謝!
不錯的工作!但它不捕獲整個時間戳字段。並且我爲某些行獲得了一些「無法處理行:」錯誤。 – Sina
已更新,以處理嵌套對象。 – xxfelixxx
謝謝,但是如果在提取字段前沒有「u」前綴(例如'to_addr':'0007031975326'而不是'to_addr':u'0007031975326'),它仍然會中斷。 – Sina