[Webinar] How to Implement Data Contracts: A Shift Left to First-Class Data Products | Register Now

Apr 11, 2023Read Time: 3 min

Unknown Magic Byte! How to Address Magic Byte Errors in Apache Kafka

Written By

Lucia CerchieSenior Software Engineer

Apr 11, 2023Read Time: 3 min

If you work with Kafka Streams, Apache Kafka® clients, and Schema Registry, you’ve likely come across this error:

Caused by: org.apache.kafka.common.errors.SerializationException: Unknown magic byte!

This error might be frustrating and mysterious at first, but hopefully, after you review this blog post, you’ll feel more confident to handle it.

What is a magic byte?

Magic byte is the first few bits of data in a file, meant to help identify the file content. They are also known as “magic numbers.” So, what causes the “unknown magic byte” error in Kafka Streams?

In general, there’s an essential problem within the publish/subscribe data architecture pattern: How do you make sure that the data formats of the publishers (in Kafka’s case, producers) match the data formats of the subscribers (in Kafka’s case, consumers)?

Kafka solves this problem with a Schema Registry. The Schema Registry lives outside of the servers that host the producers and consumers, and topics. Messages are validated by the schema before being sent. Here’s an illustration from Confluent documentation that shows how it works:

Schemas are available in three formats: Avro, Protobuf, and JSON. The Schema Registry reads the file signature to determine the format. If it doesn’t match, the dreaded “unknown magic byte” error is thrown.

How to address “unknown magic byte”

Generally, the “unknown magic byte” error means that you must reconcile serialization methods and check that your schemas use the same format on the production and consumption ends. You might have to do this in the client, in ksqlDB, or in Kafka Streams, depending on your project.

I ran into the error myself recently in a Shakespeare app that I created. I was pulling data from the Folger Shakespeare API, and trying to join two topics created with the API results in ksqlDB. I had never implemented a Schema Registry in a client before, so I didn’t realize that it was necessary when you use ksqlDB. Nothing showed up in my streams after I created them!

Luckily, ksqlDB has something called a “processing log,” which is a stream of metadata about your instance. Querying it resulted in an error with an “unknown magic byte” string.

Turns out, when you work with ksqlDB in Confluent Cloud, and you create a stream, you need to include the value format so the value can be deserialized:

CREATESTREAMcharText_MND(
    character_idVARCHAR,
   linecountVARCHAR
  )WITH(
    KAFKA_TOPIC='charText_MND',
    VALUE_FORMAT='AVRO'
  );

Since I created the stream using the “AVRO” format, I needed to set up an Avro schema in my client using confluent-schema-registry. You can view the full code that I added on my GitHub file page, but here’s the gist:

const registry = new SchemaRegistry({
 host: "https://psrc-k0w8v.us-central1.gcp.confluent.cloud",
 auth: {
   username: `${process.env.SCHEMA_USERNAME}`,
   password: `${process.env.SCHEMA_PASSWORD}`,
 },
});

const schema = `
...schema definition here
`;

const { id } = await registry.register({
 type: SchemaType.AVRO,
 schema,
});

What about “unknown magic number” in Flink SQL?

Again, this is something I’ve run into myself, so you’re not alone! Similarly to “unknown magic byte”, “unknown magic number” means that a re-evaluation of the serialization setup is needed. Double-check your schemas, making sure that they are registered on the client as well as in Confluent Cloud. Also, make sure that the serializer you’re using is the right one to serialize your events according to the specified data contract. If all else fails, delete your table and re-produce the messages—if any unformatted messages exist, Apache Flink® may be interpreting them as corrupted and won’t connect to the table.

Magic byte tutorial

If you’re still curious about magic bytes, the following tutorial helps to solidify the concept of a magic byte.

Steps:

1. `git clone https://github.com/Cerchie/magic-byte-illustration.git && cd magic-byte-illustration`

2. Now view PK.zip in your text editor. It will look like the following:

3. PK is the file signature, or magic byte, for the zip file format. You can verify it by running `file PK.zip`.

4. Let's change the file signature bytes so that `file` reads this file as a PDF.

Erase “PK” on line 1 and replace it with “%PDF”.

5. Run `file PK.zip` to confirm. Note that while the extension is still ".zip", the file signature is for a PDF, so it's identified as a PDF. Pretty cool huh?

Conclusion

This post uses Node.js for the solution; but if you’re looking to resolve it in other languages, you can take a look at the documentation for the Java client, or ask about it in the Confluent Community Slack for other languages:

Confluent Java Client Docs – get started with the Java client
Confluent Community – a place to ask your questions about Apache Kafka

Lucia Cerchie is a Senior Software Engineer at Confluent. She believes in a human-centered developer experience and in the joy of learning. She blogs at her personal site, https://luciacerchie.dev/blog/, and here on Confluent, https://www.confluent.io/blog/author/lucia-cerchie/

Did you like this blog post? Share it now

Apache Kafka Beyond the Basics: Windowing

Feb 8, 2023

Learn what windowing is in Kafka Streams and get comfortable with the differences between the main types.

Lucia Cerchie

Kafka Connect Fundamentals: What is Kafka Connect?

Sep 1, 2021

Apache Kafka® is an enormously successful piece of data infrastructure, functioning as the ubiquitous distributed log underlying the modern enterprise. It is scalable, available as a managed service, and has […]

Evan Bates

Unknown Magic Byte! How to Address Magic Byte Errors in Apache Kafka

The Confluent Developer Newsletter

Course: Apache Kafka 101

Written By

What is a magic byte?

How to address “unknown magic byte”

What about “unknown magic number” in Flink SQL?

Magic byte tutorial

Conclusion

The Confluent Developer Newsletter

Course: Apache Kafka 101

Did you like this blog post? Share it now

Apache Kafka Beyond the Basics: Windowing

Kafka Connect Fundamentals: What is Kafka Connect?

What is a magic byte?

How to address “unknown magic byte”

What about “unknown magic number” in Flink SQL?

Magic byte tutorial

Conclusion

The Confluent Developer Newsletter

Course: Apache Kafka 101

Did you like this blog post? Share it now

Subscribe to the Confluent blog

Apache Kafka Beyond the Basics: Windowing

Kafka Connect Fundamentals: What is Kafka Connect?