To preface, I'm not a Kubernetes or Mosquitto expert by any means.
I'm confused about one point. A k8s Service sends traffic to pods matching the selector that are in "Ready" state, so wouldn't you accomplish HA without the pseudocontroller by just putting both pods in the Service? The Mosquitto bridge mechanism is bi-directional so you're already getting data re-sync no matter where a client writes.
edit: I'm also curious if you could use a headless service and use an init container on the secondary to set up the bridge to the primary by selecting the IP that isn't it's own.
> so wouldn't you accomplish HA without the pseudocontroller by just putting both pods in the Service?
I'm not sure how fast that would be, the extra controller container is needed for the almost instant failover.
Answering your second question, why not an init container in the secondary, because now we can scale that failover controller up over multiple nodes, if the node where the (fairly stateless) controller runs goes down, we'd still have to wait until k8s schedules another pod instead of almost instantly.
when dealing with long lasting TCP connections, why add that extra layer of network complexity with k8s? I work for a big IoT company and we have 1.8M connections spread across 15 ec2 c8g.xlarge boxes. Not even using a NLB just round-robin DNS. Wrote our own broker with https://github.com/lesismal/nbio and use a packer .hcl file to make the AMI that each ec2 box boots. Using https://github.com/lesismal/llib/tree/master/std/crypto/tls to make nbio work with TLS.
It comes down to how much you use Kubernetes. At my company, just about everything is in Kubernetes except for databases which are hosted by Azure. So having random VMs means we need to get Ansible, SSH Keys and SOC2 compliance annoyance. So the workload effort to get VMs running may be higher than Kubernetes even if you have to put in extra hacks.
I built a high scale MQTT ingestion system by utilising the MQTT protocol handler for Apache Pulsar (https://github.com/streamnative/mop). I ran a forked version and contributed back some of non-proprietary bits.
A lot more work than Mosquitto but obviously HA/distributed and some tradeoffs w.r.t features. Worth it if you want to run Pulsar anyway for other reasons.
Would they work as performant and use the same amount of (less, almost nothing) resources? I've ran mosquito clusters with tens of thousands of connected clients, thousands of messages per second, on 2 cores and 2GB of ram, while mostly idling. (Without retention, using clean sessions and only QoS 0)...
BSL is a source-available license that by default forbids production use. After a certain period after the date of any particular release, not to exceed four years, that release automatically converts to an open source license, typically the Apache license.
Projects can add additional license grants to the base BSL. EMQX, for example, adds a grant for commercial production use of single-node installations, as well as production use for non-commercial applications.
To preface, I'm not a Kubernetes or Mosquitto expert by any means.
I'm confused about one point. A k8s Service sends traffic to pods matching the selector that are in "Ready" state, so wouldn't you accomplish HA without the pseudocontroller by just putting both pods in the Service? The Mosquitto bridge mechanism is bi-directional so you're already getting data re-sync no matter where a client writes.
edit: I'm also curious if you could use a headless service and use an init container on the secondary to set up the bridge to the primary by selecting the IP that isn't it's own.
> so wouldn't you accomplish HA without the pseudocontroller by just putting both pods in the Service?
I'm not sure how fast that would be, the extra controller container is needed for the almost instant failover.
Answering your second question, why not an init container in the secondary, because now we can scale that failover controller up over multiple nodes, if the node where the (fairly stateless) controller runs goes down, we'd still have to wait until k8s schedules another pod instead of almost instantly.
when dealing with long lasting TCP connections, why add that extra layer of network complexity with k8s? I work for a big IoT company and we have 1.8M connections spread across 15 ec2 c8g.xlarge boxes. Not even using a NLB just round-robin DNS. Wrote our own broker with https://github.com/lesismal/nbio and use a packer .hcl file to make the AMI that each ec2 box boots. Using https://github.com/lesismal/llib/tree/master/std/crypto/tls to make nbio work with TLS.
Ops type here who deals with this around Kafka.
It comes down to how much you use Kubernetes. At my company, just about everything is in Kubernetes except for databases which are hosted by Azure. So having random VMs means we need to get Ansible, SSH Keys and SOC2 compliance annoyance. So the workload effort to get VMs running may be higher than Kubernetes even if you have to put in extra hacks.
Wouldn't more modern implementations like EMQx be better suited for HA ?
I built a high scale MQTT ingestion system by utilising the MQTT protocol handler for Apache Pulsar (https://github.com/streamnative/mop). I ran a forked version and contributed back some of non-proprietary bits.
A lot more work than Mosquitto but obviously HA/distributed and some tradeoffs w.r.t features. Worth it if you want to run Pulsar anyway for other reasons.
Would they work as performant and use the same amount of (less, almost nothing) resources? I've ran mosquito clusters with tens of thousands of connected clients, thousands of messages per second, on 2 cores and 2GB of ram, while mostly idling. (Without retention, using clean sessions and only QoS 0)...
EMQX just locked HA/clustering behind a paywall: https://www.emqx.com/en/blog/adopting-business-source-licens...
Sigh that's annoying.
Edit: it's not a paywall. It's the standard BSL with a 4 year Apache revert. I personally have zero issue with this.
Oh can you comment on what this means? I'm not too familiar with it. Thanks!
BSL is a source-available license that by default forbids production use. After a certain period after the date of any particular release, not to exceed four years, that release automatically converts to an open source license, typically the Apache license.
Projects can add additional license grants to the base BSL. EMQX, for example, adds a grant for commercial production use of single-node installations, as well as production use for non-commercial applications.
It is a paywall, clustering won't work unless you have a license key.
Yeah I see that now. Ugh.
VerneMQ also has built in clustering and message replication which would make this easy.
Have you tried both EMQx and VerneMQ and would you specifically recommend one over the other? I don't have experience with VerneMQ