New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Crash #1035
Comments
|
Thanks. Nothing fancy in your config. We're currently in the process of backporting plenty of fixes to emit a new 2.3 and a new 2.2 hopefully this week, and given the number of bugs fixed, it's possible that yours is as well. If you have the core, please open it in gdb and issue "t a a bt full" to get the full backtrace. It should provide important information about where it happened. |
|
@wtarreau maybe it's important to note that it's running inside of a linux container on a RaspberryPi. I will try to get a core dump. Give me a few minutes. |
|
@wtarreau using |
If using systemd, you should also take a look at /etc/systemd/coredump.conf and coredumpctl utility |
|
By the way, coredumps will be automatically disabled by the kernel if you happen to have a "user" statement or various other ones. There is a special |
|
By the way, stupid question since this is an extremely common issue on RPi, are you certain of your power supply's stability ? |
|
@wtarreau I use the original RPi power supply for my version of the RPi. Do you have something concrete in mind which I could check? |
|
The original PSU usually is OK. Look at the red LED. If you see it flashing/blinking, it definitely indicates a power supply issue. Not seeing it flash doesn't necessarily indicate everything is OK though, but that's a good start. With that said, I was essentially asking "just in case", as there are definitely bugs from time to time in haproxy which can cause crashes, but PSU issues are also a known cause on RPi, so that's not easy to tell :-) |
|
I run haproxy on raspberry without any issue. Can you describe repro steps ? |
|
Of course. You install it in a fresh Ubuntu 20.04.1 TLS installation and use the config from above. Once the server starts, nothing bad happens. |
|
Could you please post your global and default sections ? They definitely contain the most important parts and it's impossible to try the config without them. Thanks! |
as an option, can you try to run it in gdb ? gdb --args /path/to/haproxy -f /path/to/config -d after it fails, it should escape to gdb, where you can debug as usual, i.e. "bt full" and so on |
|
Here you go |
I gave it another try but were unsuccessful. :-( Will try the gdb idea. |
|
Well, sorry to bother you again, but what's necessary for gdb to open up the port? |
|
I remembered that you pushed out a new version last week. I installed 2.3.4 but the error remains. |
|
Alright. In the meantime, I boiled the problematic part of the config down to this line: And when removing the |
|
in the config above there's nothing like can you please post the full config ? I tried to repro on amd64 on latest master branch using works as expected. I will try on Raspberry Pi 400 tomorrow |
|
Somebody here to give me a hint? |
|
Not yet, sadly, it looks like various more reproducible bugs are already keeping everyone 100% busy :-( |
|
Well, I see. :-) Just give me a sign, when you have time for it. It's still reproducible on my machine. :-D |
|
I don't see any reason that may explain why gdb is blocking the port. There is nothing to do to "open" a port from gdb, except running the program. Are you sure it is running ? Just in case, you should enter In addition, it is probably a good idea to not use the master-worker mode (removing |
|
I'm noticing that your arch is set to "armv8l" which is the 32-bit version of ARMv8, and there might be a slight difference here. Could you please post the output of "gcc -v" ? In any case, getting the gdb output at the moment of the crash would immensely help. |
|
Alright here we go, regarding the architecture. Not sure if these things should be mixed up anyway, but I can ask at LXD how they intend it to be used Regarding gdb, no luck so far. I tried: Sample output but port 443 still not open: The worst thing about this is that it's 100% reproducible without gdb but I cannot shoot an HTTP request to trigger it when using gdb. Output without gdb (just ignore the Layer4 connection problem, that is because I would need to reconfigure the backend servers for the test scenario but that is not necessary to reproduce the bug) gcc version |
|
So in short, this environment doesn't seem to work well enough for even something as standard as gdb to work. The bus error is at least a new indication that makes things a bit more precise. It usually is an alignment issue, though there should not be on armv8, except maybe for atomic ops or double-word load/stores. Could you please show the output of When you get this error there, do you have any extra info in |
|
arch-specific defines |
I still don't want to drop the gdb idea completely because it would make life much more easy. But the port is closed when starting it via gdb run.... :-/ |
|
Port 80 does not open as does not 8000. I don't understand this. |
|
But do you type "run" under gdb or do you see the gdb prompt ? It's not been very clear from the beginning. Once at the gdb prompt you have to type run, then you have no prompt anymore because gdb's waiting for haproxy to quit (or for you to Ctrl-C). Also make sure to remove -Ws from your command line under gdb. |
|
I used all those variants (with and without -Ws) |
|
|
Could you press ctrl-C when in this situation in gdb, then issue "t a a bt" ? It looks like the old issue affecting locks on macs that makes the process spin-loop on startup, except it's not a mac. |
|
|
I think I need the debug symbols for this. 1 second. |
|
Not much of a difference... :-/ |
|
It's completely bogus, it didn't even start. I sincerely think gdb doesn't manage to work at all in this environment, so you can't count on it, unfortunately. |
|
By the way, given that it didn't even start, I guess you'll get the same with any other program (e.g. In my opinion that's enough to consider that this setup cannot be trusted, you should stick to the native architecture. |
Good catch. You are right.
I seems so. I will open another issue for lxd about the gdb problem because I think that should work, right? Maybe, then I can come back. What do you think? PS: Productive-wise, I can use arm64 in lxd and everything works just fine. Thanks for your help. I learned a lot so far :-) |
|
OK, it could likely help the LXC team to get this report, maybe it's a simple issue that was not identified, maybe it's a deeper issue that cannot easily be addressed and they'll need to at least emit a warning on such a setup. As long as it works fine for you on a native setup, you should stay with this. At this point I think we can close the issue, but feel free to feed this ticket once you get some feedback there. |
|
So I could install all of this into a chroot and found that the crash happens in XXH64 when hashing an unaligned pattern to be used as a cache key: The crash happens when reading the 64-b value: XXH64 uses an ldrd instruction here, which doesn't support unaligned accesses: It's not easy though as the unaligned limitation is only true for 64 bits. I have no idea why it doesn't fail on a plain armhf system. Maybe it's just caught by the kernel in emulation. |
|
OK so I could address it. The patch splits 32-bit and 64-bit alignment checks. I kept it minimal and sent it to XXH as well. It works for me with this, thus I'm going to merge it. |
|
In the end, the real difference is the kernel: it doesn't implement alignment emulation for arm64, but these systems can run 100% valid arm code that usually rely on kernel emulation for 64 bit unaligned accesses, and don't have it available on such kernels. As such, the arm 32-bit compatibility on such kernels is not 100%. |
|
@wtarreau Wow, That's really interesting. So, everything works by accident except a few bits here and there. Just wow. I'd like to thank you for all work you've done here. Amazing. Should we make the maintainers of the kernel aware of it? |
|
I doubt this will be sufficient to get armv8 kernels to implement emulation, since it used to be disabled by default in the past. There are very few programs which make use of unaligned accesses, all those focusing on extreme performance, and it's well known that these approaches are often tricky. We don't yet know when the compiler decides to emit such instructions so it's not even trivial to provide a solid reproducer. |
Not understanding everything but I gather that haproxy is such a tool. So, you think that at this level everybody is on his/her own, right? |
There was a special case made to allow ARMv6 to use unaligned accesses via a cast in xxHash when __ARM_FEATURE_UNALIGNED is defined. But while ARMv6 (and v7) does support unaligned accesses, it's only for 32-bit pointers, not 64-bit ones, leading to bus errors when the compiler emits an ldrd instruction and the input (e.g. a pattern) is not aligned, as in issue #1035. Note that v7 was properly using the packed approach here and was safe, however haproxy versions 2.3 and older use the old r39 xxhash code which has the same issue for armv7. A slightly different fix is required there, by using a different definition of packed for 32 and 64 bits. The problem is really visible when running v7 code on a v8 kernel because such kernels do not implement alignment trap emulation, and the process dies when this happens. This is why in the issue above it was only detected under lxc. The emulation could have been disabled on v7 as well by writing zero to /proc/cpu/alignment though. This commit is a backport of xxhash commit a470f2ef ("update default memory access for armv6"). Thanks to @srkunze for the report and tests, @stgraber for his help on setting up an easy reproducer outside of lxc, and @Cyan4973 for the discussion around the best way to fix this. Details and alternate patches available on Cyan4973/xxHash#490.
There was a special case made to allow ARMv6 to use unaligned accesses via a cast in xxHash when __ARM_FEATURE_UNALIGNED is defined. But while ARMv6 (and v7) does support unaligned accesses, it's only for 32-bit pointers, not 64-bit ones, leading to bus errors when the compiler emits an ldrd instruction and the input (e.g. a pattern) is not aligned, as in issue haproxy#1035. Note that v7 was properly using the packed approach here and was safe, however haproxy versions 2.3 and older use the old r39 xxhash code which has the same issue for armv7. A slightly different fix is required there, by using a different definition of packed for 32 and 64 bits. The problem is really visible when running v7 code on a v8 kernel because such kernels do not implement alignment trap emulation, and the process dies when this happens. This is why in the issue above it was only detected under lxc. The emulation could have been disabled on v7 as well by writing zero to /proc/cpu/alignment though. This commit is a backport of xxhash commit a470f2ef ("update default memory access for armv6"). Thanks to @srkunze for the report and tests, @stgraber for his help on setting up an easy reproducer outside of lxc, and @Cyan4973 for the discussion around the best way to fix this. Details and alternate patches available on Cyan4973/xxHash#490. (cherry picked from commit 4acb99f) [wt: used the different version suitable for backpotring, using the distinct packed settings] Signed-off-by: Willy Tarreau <w@1wt.eu>
There was a special case made to allow ARMv6 to use unaligned accesses via a cast in xxHash when __ARM_FEATURE_UNALIGNED is defined. But while ARMv6 (and v7) does support unaligned accesses, it's only for 32-bit pointers, not 64-bit ones, leading to bus errors when the compiler emits an ldrd instruction and the input (e.g. a pattern) is not aligned, as in issue haproxy#1035. Note that v7 was properly using the packed approach here and was safe, however haproxy versions 2.3 and older use the old r39 xxhash code which has the same issue for armv7. A slightly different fix is required there, by using a different definition of packed for 32 and 64 bits. The problem is really visible when running v7 code on a v8 kernel because such kernels do not implement alignment trap emulation, and the process dies when this happens. This is why in the issue above it was only detected under lxc. The emulation could have been disabled on v7 as well by writing zero to /proc/cpu/alignment though. This commit is a backport of xxhash commit a470f2ef ("update default memory access for armv6"). Thanks to @srkunze for the report and tests, @stgraber for his help on setting up an easy reproducer outside of lxc, and @Cyan4973 for the discussion around the best way to fix this. Details and alternate patches available on Cyan4973/xxHash#490. (cherry picked from commit 4acb99f) [wt: used the different version suitable for backpotring, using the distinct packed settings] Signed-off-by: Willy Tarreau <w@1wt.eu> (cherry picked from commit 59ad20e) Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
There was a special case made to allow ARMv6 to use unaligned accesses via a cast in xxHash when __ARM_FEATURE_UNALIGNED is defined. But while ARMv6 (and v7) does support unaligned accesses, it's only for 32-bit pointers, not 64-bit ones, leading to bus errors when the compiler emits an ldrd instruction and the input (e.g. a pattern) is not aligned, as in issue haproxy#1035. Note that v7 was properly using the packed approach here and was safe, however haproxy versions 2.3 and older use the old r39 xxhash code which has the same issue for armv7. A slightly different fix is required there, by using a different definition of packed for 32 and 64 bits. The problem is really visible when running v7 code on a v8 kernel because such kernels do not implement alignment trap emulation, and the process dies when this happens. This is why in the issue above it was only detected under lxc. The emulation could have been disabled on v7 as well by writing zero to /proc/cpu/alignment though. This commit is a backport of xxhash commit a470f2ef ("update default memory access for armv6"). Thanks to @srkunze for the report and tests, @stgraber for his help on setting up an easy reproducer outside of lxc, and @Cyan4973 for the discussion around the best way to fix this. Details and alternate patches available on Cyan4973/xxHash#490. (cherry picked from commit 4acb99f) [wt: used the different version suitable for backpotring, using the distinct packed settings] Signed-off-by: Willy Tarreau <w@1wt.eu> (cherry picked from commit 59ad20e) Signed-off-by: Christopher Faulet <cfaulet@haproxy.com> (cherry picked from commit 5b1f60d) Signed-off-by: Christopher Faulet <cfaulet@haproxy.com> (cherry picked from commit 77eed6c) Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
There was a special case made to allow ARMv6 to use unaligned accesses via a cast in xxHash when __ARM_FEATURE_UNALIGNED is defined. But while ARMv6 (and v7) does support unaligned accesses, it's only for 32-bit pointers, not 64-bit ones, leading to bus errors when the compiler emits an ldrd instruction and the input (e.g. a pattern) is not aligned, as in issue haproxy#1035. Note that v7 was properly using the packed approach here and was safe, however haproxy versions 2.3 and older use the old r39 xxhash code which has the same issue for armv7. A slightly different fix is required there, by using a different definition of packed for 32 and 64 bits. The problem is really visible when running v7 code on a v8 kernel because such kernels do not implement alignment trap emulation, and the process dies when this happens. This is why in the issue above it was only detected under lxc. The emulation could have been disabled on v7 as well by writing zero to /proc/cpu/alignment though. This commit is a backport of xxhash commit a470f2ef ("update default memory access for armv6"). Thanks to @srkunze for the report and tests, @stgraber for his help on setting up an easy reproducer outside of lxc, and @Cyan4973 for the discussion around the best way to fix this. Details and alternate patches available on Cyan4973/xxHash#490. (cherry picked from commit 4acb99f) [wt: used the different version suitable for backpotring, using the distinct packed settings] Signed-off-by: Willy Tarreau <w@1wt.eu> (cherry picked from commit 59ad20e) Signed-off-by: Christopher Faulet <cfaulet@haproxy.com> (cherry picked from commit 5b1f60d) Signed-off-by: Christopher Faulet <cfaulet@haproxy.com> (cherry picked from commit 77eed6c) Signed-off-by: Christopher Faulet <cfaulet@haproxy.com> (cherry picked from commit c02437c) Signed-off-by: Amaury Denoyelle <adenoyelle@haproxy.com>
There was a special case made to allow ARMv6 to use unaligned accesses via a cast in xxHash when __ARM_FEATURE_UNALIGNED is defined. But while ARMv6 (and v7) does support unaligned accesses, it's only for 32-bit pointers, not 64-bit ones, leading to bus errors when the compiler emits an ldrd instruction and the input (e.g. a pattern) is not aligned, as in issue haproxy#1035. Note that v7 was properly using the packed approach here and was safe, however haproxy versions 2.3 and older use the old r39 xxhash code which has the same issue for armv7. A slightly different fix is required there, by using a different definition of packed for 32 and 64 bits. The problem is really visible when running v7 code on a v8 kernel because such kernels do not implement alignment trap emulation, and the process dies when this happens. This is why in the issue above it was only detected under lxc. The emulation could have been disabled on v7 as well by writing zero to /proc/cpu/alignment though. This commit is a backport of xxhash commit a470f2ef ("update default memory access for armv6"). Thanks to @srkunze for the report and tests, @stgraber for his help on setting up an easy reproducer outside of lxc, and @Cyan4973 for the discussion around the best way to fix this. Details and alternate patches available on Cyan4973/xxHash#490. (cherry picked from commit 4acb99f) [wt: used the different version suitable for backpotring, using the distinct packed settings] Signed-off-by: Willy Tarreau <w@1wt.eu> (cherry picked from commit 59ad20e) Signed-off-by: Christopher Faulet <cfaulet@haproxy.com> (cherry picked from commit 5b1f60d) Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
There was a special case made to allow ARMv6 to use unaligned accesses via a cast in xxHash when __ARM_FEATURE_UNALIGNED is defined. But while ARMv6 (and v7) does support unaligned accesses, it's only for 32-bit pointers, not 64-bit ones, leading to bus errors when the compiler emits an ldrd instruction and the input (e.g. a pattern) is not aligned, as in issue haproxy#1035. Note that v7 was properly using the packed approach here and was safe, however haproxy versions 2.3 and older use the old r39 xxhash code which has the same issue for armv7. A slightly different fix is required there, by using a different definition of packed for 32 and 64 bits. The problem is really visible when running v7 code on a v8 kernel because such kernels do not implement alignment trap emulation, and the process dies when this happens. This is why in the issue above it was only detected under lxc. The emulation could have been disabled on v7 as well by writing zero to /proc/cpu/alignment though. This commit is a backport of xxhash commit a470f2ef ("update default memory access for armv6"). Thanks to @srkunze for the report and tests, @stgraber for his help on setting up an easy reproducer outside of lxc, and @Cyan4973 for the discussion around the best way to fix this. Details and alternate patches available on Cyan4973/xxHash#490. (cherry picked from commit 4acb99f) [wt: used the different version suitable for backpotring, using the distinct packed settings] Signed-off-by: Willy Tarreau <w@1wt.eu> (cherry picked from commit 59ad20e) Signed-off-by: Christopher Faulet <cfaulet@haproxy.com> (cherry picked from commit 5b1f60d) Signed-off-by: Christopher Faulet <cfaulet@haproxy.com> (cherry picked from commit 77eed6c) Signed-off-by: Christopher Faulet <cfaulet@haproxy.com> (cherry picked from commit c02437c) Signed-off-by: Amaury Denoyelle <adenoyelle@haproxy.com> (cherry picked from commit 97cee32) Signed-off-by: Christopher Faulet <cfaulet@haproxy.com> (cherry picked from commit 939467c) Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
There was a special case made to allow ARMv6 to use unaligned accesses via a cast in xxHash when __ARM_FEATURE_UNALIGNED is defined. But while ARMv6 (and v7) does support unaligned accesses, it's only for 32-bit pointers, not 64-bit ones, leading to bus errors when the compiler emits an ldrd instruction and the input (e.g. a pattern) is not aligned, as in issue haproxy#1035. Note that v7 was properly using the packed approach here and was safe, however haproxy versions 2.3 and older use the old r39 xxhash code which has the same issue for armv7. A slightly different fix is required there, by using a different definition of packed for 32 and 64 bits. The problem is really visible when running v7 code on a v8 kernel because such kernels do not implement alignment trap emulation, and the process dies when this happens. This is why in the issue above it was only detected under lxc. The emulation could have been disabled on v7 as well by writing zero to /proc/cpu/alignment though. This commit is a backport of xxhash commit a470f2ef ("update default memory access for armv6"). Thanks to @srkunze for the report and tests, @stgraber for his help on setting up an easy reproducer outside of lxc, and @Cyan4973 for the discussion around the best way to fix this. Details and alternate patches available on Cyan4973/xxHash#490. (cherry picked from commit 4acb99f) [wt: used the different version suitable for backpotring, using the distinct packed settings] Signed-off-by: Willy Tarreau <w@1wt.eu> (cherry picked from commit 59ad20e) Signed-off-by: Christopher Faulet <cfaulet@haproxy.com> (cherry picked from commit 5b1f60d) Signed-off-by: Christopher Faulet <cfaulet@haproxy.com> (cherry picked from commit 77eed6c) Signed-off-by: Christopher Faulet <cfaulet@haproxy.com> (cherry picked from commit c02437c) Signed-off-by: Amaury Denoyelle <adenoyelle@haproxy.com> (cherry picked from commit 97cee32) Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>


OK, guys. Crashed it again - this time with a bug report ;-) Change in the config was to add some awareness of static files. Config does not work with 2.2.6 and 2.3.2
Starting config (didn't work with 2.0.13; confirmed working with 2.2.6):
The text was updated successfully, but these errors were encountered: