Introducing Kafka Connect Auto-Restart in Kadeck. To make life easier for infrastructure and operations teams running Kafka Connect connectors and to ensure higher uptime, it is now possible to have Kadeck automatically restart crashed Kafka Connect connectors. This feature is part of a free update to the Kadeck Teams Enterprise package.
Users configure restart attempts and a grace period between restarts. Kadeck monitors the Kafka Connect connectors and restarts crashed tasks accordingly. Each restart is logged in the audit log This guide shows setup and usage.
If a Kafka Connect Connector instance monitored by the auto-restart feature fails, Kadeck restarts it as soon as it is detected.
The check occurs every 60 seconds by default, but can be customized in the settings. Depending on the number of checks and latency, a run may take longer than the configured interval.
If a crashed Kafka Connect Connector instance is detected, a restart is attempted. This will start a Grace Period countdown. After the Grace Period expires, the next restart is attempted until the maximum number of restart attempts is reached.
During a restart, the connector changes to the "Restart attempt" status. Each restart is logged in the audit log with error information.
If the last restart attempt fails, a CRITICAL log is issued and the Connector changes to the "Persistent Failure" state. A restart is no longer attempted.
The connector is still checked and the status is reset as soon as all tasks could be started correctly (manually) again.
The monitoring interval can be set by an administrator in the settings via the entry
Kafka Connect Auto-Restart Interval
. By default, the setting is set to 60s.
The check is executed by different Kadeck nodes in round-robin mode in Kadeck cluster operation.
KafkaConnectManage rights are needed to configure auto-restarts.