Perf fixes #8399
Merged
Perf fixes #8399
Conversation
Manage this branch in SquashTest this branch here: https://amossperf-fixes-bionf.squash.io |
Currently disabled as my local setup has blown up and I need to gently repair the local dev env with a blow torch.
The main thread dumps a 2MB payload for writing to the broker every time the write queue is empty. Currently we are thrashing the LWS event loop to see what it does, and interleaving with the mosquitto event loop. It take between 40-60s for the payload to arrive at the inspection point. The write-queue in the agent stays full for this period.
Message disassembly before transmission. It looks as if the MQTT broker reassembles the messages from the stream of independent websockets messages. This would make sense. There is a massive delay in the lws library between asking for the writable callback and actually receiving it. This is to do with the size of the SND_BUFFER option on the socket. This fix is a bit CPU heavy as it cancels the poll() to force a reattempt. This needs to be properly integrated into our poll-loop.
87a0559
into
netdata:master
33 of 36 checks passed
33 of 36 checks passed
Saruspete
added a commit
to Saruspete/netdata
that referenced
this pull request
May 21, 2020
Add an inspection point for VerneMQ in the local dev env. Remove the bottleneck in sending websocket messages, at the expense of increased CPU-load. Fixed the message encoding. Added support for stress testing - it is still enabled in the main loop so will fire stress-testing payloads when the ACLK is established. Next patch will integrate the socket polling properly to reduce the CPU overhead and remove the stress testing payloads.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.


Summary
Initial work on fixing the performance problems in LWS / MQTT. Reduces the visible latency from 120s for the initial on-connect payloads to 10-20s. May increase CPU. Fixes the encoding errors for the JSON messages on the link allowing the alarms to work.
Component Name
ACLK
Test Plan
Additional Information
Although this is not a complete fix for the performance problems - it does improve performance and allows other PRs to be merged on top.