The S3 connector, currently available as a sink, allows you to export data from Kafka topics to S3 objects in either Avro or JSON formats. In addition, for certain data layouts, S3 connector exports data by guaranteeing exactly-once delivery semantics to consumers of the S3 objects it produces.
Being a sink, the S3 connector periodically polls data from Kafka and in turn uploads it to S3.
A partitioner is used to split the data of every Kafka partition into chunks. Each chunk of data is represented as an S3 object, whose key name encodes the topic, the Kafka partition and the start offset of this data chunk. If no partitioner is specified in the configuration, the default partitioner which preserves Kafka partitioning is used. The size of each data chunk is determined by the number of records written to S3 and by schema compatibility.
Use the Confluent Hub client to install this connector with:
confluent-hub install confluentinc/kafka-connect-s3:5.0.0
Or download the ZIP file and extract it into one of the directories that is listed on the Connect worker's plugin.path configuration properties. This must be done on each of the installations where Connect will be run. See here for more detailed instructions.
Once installed, you can then create a connector configuration file with the connector's settings, and deploy that to a Connect worker. See here for more detailed instructions.
The source code is located in this repository.
For more information, see the documentation.