2017-04-07 96 views
2

我想通過Kafka發送一個非常簡單的JSON對象,並使用Python和kafka-python將其讀出。但是,我總是看到以下錯誤:使用Kafka-Python的解串器無法使用來自Kafka的JSON消息

2017-04-07 10:28:52,030.30.9998989105:kafka.future:8228:ERROR:10620:Error processing callback 
Traceback (most recent call last): 
    File "C:\Anaconda2\lib\site-packages\kafka\future.py", line 79, in _call_backs 
    f(value) 
    File "C:\Anaconda2\lib\site-packages\kafka\consumer\fetcher.py", line 760, in _handle_fetch_response 
    unpacked = list(self._unpack_message_set(tp, messages)) 
    File "C:\Anaconda2\lib\site-packages\kafka\consumer\fetcher.py", line 539, in _unpack_message_set 
    tp.topic, msg.value) 
    File "C:\Anaconda2\lib\site-packages\kafka\consumer\fetcher.py", line 570, in _deserialize 
    return f(bytes_) 
    File "C:\Users\myUser\workspace\PythonKafkaTest\src\example.py", line 55, in <lambda> 
    value_deserializer=lambda m: json.loads(m).decode('utf-8')) 
    File "C:\Anaconda2\lib\json\__init__.py", line 339, in loads 
    return _default_decoder.decode(s) 
    File "C:\Anaconda2\lib\json\decoder.py", line 364, in decode 
    obj, end = self.raw_decode(s, idx=_w(s, 0).end()) 
    File "C:\Anaconda2\lib\json\decoder.py", line 382, in raw_decode 
    raise ValueError("No JSON object could be decoded") 
ValueError: No JSON object could be decoded 

我做了一些研究,但此錯誤的最常見的原因是JSON是錯誤的。我已經嘗試打印出JSON,然後通過將以下內容添加到我的代碼中並將JSON打印出來而沒有錯誤。

while True: 
     json_obj1 = json.dumps({"dataObjectID": "test1"}) 
     print json_obj1 
     producer.send('my-topic', {"dataObjectID": "test1"}) 
     producer.send('my-topic', {"dataObjectID": "test2"}) 
     time.sleep(1) 

這使我懷疑我可以產生json,但不會消耗它。

這裏是我的代碼:

import threading 
import logging 
import time 
import json 

from kafka import KafkaConsumer, KafkaProducer 


class Producer(threading.Thread): 
    daemon = True 

    def run(self): 
     producer = KafkaProducer(bootstrap_servers='localhost:9092', 
           value_serializer=lambda v: json.dumps(v).encode('utf-8')) 

     while True: 
      producer.send('my-topic', {"dataObjectID": "test1"}) 
      producer.send('my-topic', {"dataObjectID": "test2"}) 
      time.sleep(1) 


class Consumer(threading.Thread): 
    daemon = True 

    def run(self): 
     consumer = KafkaConsumer(bootstrap_servers='localhost:9092', 
           auto_offset_reset='earliest', 
           value_deserializer=lambda m: json.loads(m).decode('utf-8')) 
     consumer.subscribe(['my-topic']) 

     for message in consumer: 
      print (message) 


def main(): 
    threads = [ 
     Producer(), 
     Consumer() 
    ] 

    for t in threads: 
     t.start() 

    time.sleep(10) 

if __name__ == "__main__": 
    logging.basicConfig(
     format='%(asctime)s.%(msecs)s:%(name)s:%(thread)d:' + 
       '%(levelname)s:%(process)d:%(message)s', 
     level=logging.INFO 
    ) 
    main() 

我可以成功地發送和接收字符串,如果我刪除value_serializer和value_deserializer。當我運行代碼,我可以看到我在發送JSON這裏是一個簡短snipit:

ConsumerRecord(topic=u'my-topic', partition=0, offset=5742, timestamp=None, timestamp_type=None, key=None, value='{"dataObjectID": "test1"}', checksum=-1301891455, serialized_key_size=-1, serialized_value_size=25) 
ConsumerRecord(topic=u'my-topic', partition=0, offset=5743, timestamp=None, timestamp_type=None, key=None, value='{"dataObjectID": "test2"}', checksum=-1340077864, serialized_key_size=-1, serialized_value_size=25) 
ConsumerRecord(topic=u'my-topic', partition=0, offset=5744, timestamp=None, timestamp_type=None, key=None, value='test', checksum=1495943047, serialized_key_size=-1, serialized_value_size=4) 
ConsumerRecord(topic=u'my-topic', partition=0, offset=5745, timestamp=None, timestamp_type=None, key=None, value='\xc2Hello, stranger!', checksum=-1090450220, serialized_key_size=-1, serialized_value_size=17) 
ConsumerRecord(topic=u'my-topic', partition=0, offset=5746, timestamp=None, timestamp_type=None, key=None, value='test', checksum=1495943047, serialized_key_size=-1, serialized_value_size=4) 
ConsumerRecord(topic=u'my-topic', partition=0, offset=5747, timestamp=None, timestamp_type=None, key=None, value='\xc2Hello, stranger!', checksum=-1090450220, serialized_key_size=-1, serialized_value_size=17) 

所以,我試圖從消費者移除value_deserializer,而代碼執行,但沒有消息傳出解串器作爲一個字符串,這不是我所需要的。那麼,爲什麼value_deserializer沒有工作?是否有不同的方式從我應該使用的Kafka消息中獲取JSON?

回答

1

原來問題是value_deserializer=lambda m: json.loads(m).decode('utf-8')的解碼部分,當我將其更改爲value_deserializer=lambda m: json.loads(m),然後我看到從Kafka讀取的對象的類型現在是字典。其基於從Python的JSON文件,以下信息是正確的:

|---------------------|------------------| 
|  JSON   |  Python  | 
|---------------------|------------------| 
|  object   |  dict  | 
|---------------------|------------------| 
|  array   |  list  | 
|---------------------|------------------| 
|  string   |  unicode  | 
|---------------------|------------------| 
|  number (int) |  int, long | 
|---------------------|------------------| 
|  number (real) |  float  | 
|---------------------|------------------| 
|  true   |  True  | 
|---------------------|------------------| 
|  false   |  False  | 
|---------------------|------------------| 
|  null   |  None  | 
|---------------------|------------------| 
4

我的問題是第一解碼消息爲UTF-8後問題,然後json.load /轉儲:

value_deserializer=lambda m: json.loads(m.decode('utf-8')) 

而不是:

value_deserializer=lambda m: json.loads(m).decode('utf-8') 

希望這也將工作的製片人方

相關問題