Cambium cnPilot 190W: New life for old hardware, improving reliability & performance with OpenWrt, fq_codel & cake

The intention of this post is to demonstrate how an old piece of hardware can improve its reliability and performance with new software, which ultimately improved customer satisfaction.

After a couple of years operating a WISP network using Cambium cnPilot 19XY devices, I’ve observed that no matter how many improvements are implemented in the network, the router, the last piece of equipment on the communication chain that is the one actually serving traffic to customers, plays a very critical role.

Observation

At times, customers using Cambium cnPilot 190W routers, were randomly complaining about the quality of the service, specially during peak traffic. And even though I have been improving the performance and observability of the network all the way from the border router to the subscriber module, such random complaints still persisted, being the router the only black-box or missing piece of equipment in the communication chain that lacked of observavility.

Cambium cnPilot 190W stock firmware version 4.8-R15 was released on Oct 21, 2022, however, it cames with Linux kernel version 2.6.36 that was released on Oct 20, 2010, which is missing more than 10 years of innovation (security, performance, etc).

# uname -ad
Linux <hostname> 2.6.36 #1 Wed Sep 28 16:01:45 CST 2022 mips unknown

# ip addr | grep " eth2: "
2: eth2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UNKNOWN qlen 1000
  • Linux kernel Version: 2.6.36
  • Default queue discipline for network interfaces: pfifo_fast.

We tested all the typical troubleshooting techniques such as 1) checking for RF interference either by manually selecting the channel or re-locating the router, 2) upgrading to the latest stock firmware available, 3) enabled DMZ for gamers to avoid an additional NAT behind the router, 4) rebooting the routers and asking customers to try again (I had to admit), but mostly due to the “auto” channel selection behavior that picks the lowest less-noisy RF channel available, 5) an finally replacing the router with a new one; the list goes on, as we and the customers became more creative over time. (To keep this short, I ommited electrical, networking cabling, and all the troubleshooting that happened from the subscriber module all the way to the border router).

At that point, we had few hundred devices deployed, and replacing all of them with brand new equipment wasn’t financially attractive.

So I wondered if was there anything else that can be done to rescue, improve and enhance the quality of service for those customers without throwing away the equipment? The only option I could think of was to try a new software using with the same hardware.

Given that I only had the random customer complaints as a reference and the stock firmware was a black-box, I had to find an additional quantitative measurement to validate if a significant improvement could be made. So I approached the improvement with Bufferbloat (latency increase under load) at the center of this investigation, a well-known and documented technical challenge that has been addressed with advance/smart queue management techniques, such as FQ_Codel and most recently Cake. Although we were already shaping traffic in both the up and down directions with a LibreQoS middle-box, Dave Täht suggested that we shape the up-link at the CPE.

Running cake on the actual bottleneck prevents malignant traffic from escaping the home network. A mere ping flood can be controlled by libreqos, but it is better to slow that down and mix it up with all the other traffic at the cpe. The algorithms for ack-filtering and congestion management are always more accurate when put on the actual bottleneck.

I measured the latency under load (bufferbloat) on a couple of customers with the stock firmware, as shown in the following images:

<Placeholder - I will include a couple of bufferbloat tests of some customers that haven’t upgraded yet>

Hypothesis & Goals

As the stock firmware was running an old Linux kernel version, if I was able to run a newer one, I could improve by 10+ years the performance of the old hardware (at least that was my hope). My goals were:

  1. Shape upload traffic with Cake, in the CPE router, closer to where traffic originates.
  2. Single digit latency increase under load.
  3. Increase hardware reliability by using a modern Linux kernel version and wireless driver.
  4. Increase observability by having direct access to the router to effectively measure cpu, increase log verbosity, etc.
  5. Upgrade firmware remotely.

I would consider the experiment successful if every single goal was met!

Firmware

This was the most challenging part of the experiment, as I had no previous experience developing firmware, and nobody else in the community had built an opensource firmware for the specific model Cambium cnPilot 190W, however, they have done a good job porting other devices using the same chipset MT7628 from MediaTek. I chose OpenWrt as it’s well supported by an active community and was also used by the CeroWrt Project from Bufferbloat.net, as a platform for their research and improvements for fq_codel and cake.

After weeks of learning, try and error, fine-tuning the Linux kernel options, and patching the master branch with code on the wireless driver that weren’t yet merged to the latest version, and a lot of support from the OpenWrt and LibreQos communities, I was able to build a custom and stable OpenWrt image, that I called NafiuxWrt for fun. A high degree of reverse engineering was needed, but that is another story.

Results

The first production deployment exceeded my expectations!! I was able to flash the CPE router remotely, configure the up-link shaping with cake, and created a brand new, modern, full fledged linux machine that gave me a high degree of observability.

So, I invited the customer to join a video call (connected to the router with the new firmware) and had him run this waveform speedtest:

(View complete report)

Wow! It is possible with modernized software on ancient hardware to have a great Internet experience, an totally usable videoconferencing, under load even on a 10Mbit/4Mbit link.

I was purposefully talking the whole time while executing the test, to validate with the customer if there was any loss/distortion on the audio, something that fortunately never happened.

We’re now in the process of deploying the firmware to all the network, which will also give us flexibility to continue personalizing the firmware to better serve our customers.