Message boards : Questions and problems : Compute error - SIGSEGV: segmentation violation
Message board moderation
Author | Message |
---|---|
Send message Joined: 17 Sep 19 Posts: 12 ![]() |
I am sure others are having this issue or have, but a search of the forum didn’t pop up any recent posts or resolutions. I decided to add my web server into my computers as it sits mostly idle, it’s a Debian 10(Buster) Linux box<details below>. First I tried the repository install with apt. Then talking to people on the Seti@Home forums they suggested using a Berkeley version they claimed would be more efficient and include everything needed. Once I got that running I was back to the same errors. A pretty knowledgeable friend had me look at a number of things and convinced me to return to the repository version which I did and am currently running. Any help to get past or at least understand this would be appreciated. My skill level on Linux is just enough to be dangerous so please be a bit more -verbose in explanations or how to do something. Radjin~ ====== A typical error as listed on my accounts/computers/tasks page: Task 8058757637 Name blc11_2bit_guppi_58692_04223_HIP79568_0125.25756.0.21.44.68.vlar_0 Workunit 3657386096 Created 18 Sep 2019, 11:11:31 UTC Sent 18 Sep 2019, 16:48:51 UTC Report deadline 21 Nov 2019, 3:42:30 UTC Received 18 Sep 2019, 16:56:17 UTC Server state Over Outcome Computation error Client state Compute error Exit status 11 (0x0000000B) Unknown error code Computer ID 8816958 Run time CPU time Validate state Invalid Credit 0.00 Device peak FLOPS 4.14 GFLOPS Application version SETI@home v8 v8.00 x86_64-pc-linux-gnu Stderr output <core_client_version>7.14.2</core_client_version> <![CDATA[ <message> process got signal 11</message> <stderr_txt> SIGSEGV: segmentation violation </stderr_txt> ]]> ====== My computer: CPU type GenuineIntel Intel(R) Core(TM) i7-2700K CPU @ 3.50GHz [Family 6 Model 42 Stepping 7] Number of processors 8 Coprocessors --- Virtualization None Operating System Linux Debian Debian GNU/Linux 10 (buster) [4.19.0-6-amd64|libc 2.28 (Debian GLIBC 2.28-10)] BOINC version 7.14.2 Memory 31.3 GB Cache 8192 KB Swap space 15.89 GB Total disk space 884.49 GB Free Disk Space 751.27 GB Measured floating point speed 4.14 billion ops/sec Measured integer speed 63.45 billion ops/sec Average upload rate 147 KB/sec Average download rate 2478.34 KB/sec Average turnaround time 0 days Application details Show Tasks 307 Number of times client has contacted server 37 Last time contacted server 20 Sep 2019, 11:08:20 UTC Fraction of time BOINC is running 98.97% While BOINC is running, fraction of time computing is allowed 100.00% While is BOINC running, fraction of time GPU computing is allowed 100.00% Task duration correction factor 1 |
![]() Send message Joined: 29 Aug 05 Posts: 15632 ![]() |
From https://setiathome.berkeley.edu/forum_thread.php?id=84658&postid=2012358 Keith Myers wrote: Sigsegv errors are usually caused by unstable cpu clocks or unstable memory clocks. Something is corrupting memory addresses. This a OS issue and not a BOINC or Seti issue. He's right you know? |
Send message Joined: 17 Sep 19 Posts: 12 ![]() |
I am not disputing that it is a memory issue either corruption or fixed memory location that is out of range; everything I have read says that, however it only happens with seti/BOINC. I have also read that with the disabled vsyscall in later kernels this error is a known issue with BOINC in posts as late as 2018. What I am asking is since this is a known issue, how does one diagnose the cause of the error, memory, OS, BOINC? How did others resolve the issue? I can find dozens of references to the issue, all with a BOINC project, but only two resolutions, where vsyscall was put into emulate mode. This is not a bash post, it’s a call to the experts to help explain and resolve a problem that affects a number of people. At the moment all I have gotten is the equivalent of dump your computer and build a new one, dump your OS and do a clean install of this OS, don’t use the stable repository to install BOINC as recommended by the BOINC literature, install this custom package instead(which gave me the same error). I’m open to try new things except run unstable software. At this point the only time I get this error is with BOINC. Is the reason that everyone goes silent on solving this issue because it’s unsolvable? |
![]() Send message Joined: 29 Aug 05 Posts: 15632 ![]() |
Actually, the time you get the error is with Seti, as it is their science application that runs the work and giving the error. BOINC is the managing software, it doesn't do any of the calculations, using of RAM or anything intensive. As you have shown in your other post, you can run BOINC until you press Ctrl+C without problems. So, BOINC isn't causing the SIGSEV error. So what you can test is run another project. See if that project's science application(s) return the same error, and if it doesn't, it's something specific about Seti's application/work form that reacts this way on your system. Then you'll have to go back to them to work that out. If another project is returning the same errors, it may be your hardware/OS anyway. Or the programming code used is similar to Seti's. The one I know that wildly differs is Climateprediction.net as they used to use Fortran for their code. See https://boinc.berkeley.edu/projects.php for a list of projects and if they have support for Linux. You can also try different managers, such as Prime95+ or Folding@Home. |
Send message Joined: 17 Sep 19 Posts: 12 ![]() |
Thanks for the suggestions, I will add the project tonight to see what happens. |
![]() Send message Joined: 28 Jun 10 Posts: 2842 ![]() |
Sigsegv errors are usually caused by unstable cpu clocks or unstable memory clocks. Something is corrupting memory addresses. This a OS issue and not a BOINC or Seti issue. There have been batches of work in the past from CPDN when computers with very high success rates on completing tasks have produced over 25% of these errors. If SETI is producing a significant number of these on any particular type of task, it doesn't preclude a hardware problem but it certainly suggests that at the very least, these tasks stress the hardware in a way the other tasks don't. My other argument against it being hardware is that the tasks on CPDN would mostly fail at the same point. E.G just before creation of first zip or end of first model day, almost always at a point that could be pinpointed. So, Yes, checking with other projects that stress the hardware to the same level is a good idea. I haven't seen it with CPDN for over a year now but unfortunately those at Oxford have had problems with the latest batch due to go out under Linux so I doubt there will be any work there before Monday at the earliest. |
Send message Joined: 23 Apr 12 Posts: 77 |
What I am asking is since this is a known issue, how does one diagnose the cause of the error, memory, OS, BOINC? How did others resolve the issue? I can find dozens of references to the issue, all with a BOINC project, but only two resolutions, where vsyscall was put into emulate mode.So you have identified a possible (and IMO very likely) cause, and you know a workaround. But you don't mention that you have tried it, or an outcome. What about that? |
![]() Send message Joined: 28 Jun 10 Posts: 2842 ![]() |
With CPDN, it stopped I think after a newer version of the particular model type so not a lot you can do at the user end. I can't comment on the Seti@home because I have never seen it with them. |
Send message Joined: 17 Sep 19 Posts: 12 ![]() |
What I am asking is since this is a known issue, how does one diagnose the cause of the error, memory, OS, BOINC? How did others resolve the issue? I can find dozens of references to the issue, all with a BOINC project, but only two resolutions, where vsyscall was put into emulate mode.So you have identified a possible (and IMO very likely) cause, and you know a workaround. But you don't mention that you have tried it, or an outcome. What about that? This is another interesting conundrum. I added to grub: GRUB_CMDLINE_LINUX_DEFAULT="VSYSCALL=EMULATE" sudo update-grub Generating grub configuration file ... Found linux image: /boot/vmlinuz-4.19.0-6-amd64 Found initrd image: /boot/initrd.img-4.19.0-6-amd64 Found linux image: /boot/vmlinuz-4.19.0-5-amd64 Found initrd image: /boot/initrd.img-4.19.0-5-amd64 Found memtest86+ image: /boot/memtest86+.bin Found memtest86+ multiboot image: /boot/memtest86+_multiboot.bin doneThe grub update seemed to go ok. cat /usr/src/linux-headers-$(uname -r)/.config | grep cat: /usr/src/linux-headers-4.19.0-6-amd64/.config: No such file or directoryFor some reason I don’t get the confirmation of the emulation mode. |
Send message Joined: 17 Sep 19 Posts: 12 ![]() |
With CPDN, it stopped I think after a newer version of the particular model type so not a lot you can do at the user end. I can't comment on the Seti@home because I have never seen it with them. I haven’t seen any work being downloaded but I will wait a week and see what happens. Aside from trying different projects to see what happens, how can I test the possibility hardware issues? I see a memtest86+ but it comes with mixed reviews. |
![]() ![]() Send message Joined: 27 Jun 08 Posts: 642 ![]() |
With CPDN, it stopped I think after a newer version of the particular model type so not a lot you can do at the user end. I can't comment on the Seti@home because I have never seen it with them. That memtest works fine. I have used in on latest Dell Area51 back to old dual opteron servers. Usually the ubuntu install comes with it. |
Send message Joined: 23 Apr 12 Posts: 77 |
I added to grub: GRUB_CMDLINE_LINUX_DEFAULT="VSYSCALL=EMULATE"Better make that "vsyscall=emulate". I wouldn't be surprised to see that the upper case version doesn't work. The grub update seemed to go ok.So it seems. Of course you rebooted after that? For some reason I don’t get the confirmation of the emulation mode.I'm not aware of a way to query the current mode. You could look at /proc/cmdline. And of course the best confirmation would be if your application didn't segfault any longer. Be aware that vsyscall is just a run time parameter, it overrides the kernel default but doesn't change it permanently. |
![]() ![]() Send message Joined: 17 Nov 16 Posts: 906 ![]() |
In your case the answer is yes. Since you aren't open to any of the suggestions. Those posts you referenced were from years ago and have been solved by updating the OS or updating the science apps or updating BOINC. Disabling vsyscall was only needed at Einstein for a little while until they produced science apps compatible with the older distributions and put a computer preference in the science app selection in Project preferences. OTHER SETTINGS |
Send message Joined: 17 Sep 19 Posts: 12 ![]() |
I added to grub: GRUB_CMDLINE_LINUX_DEFAULT="VSYSCALL=EMULATE"Better make that "vsyscall=emulate". I wouldn't be surprised to see that the upper case version doesn't work. Yes, thank you. I understand it is a temporary thing. |
Send message Joined: 17 Sep 19 Posts: 12 ![]() |
OTHER SETTINGS This is a specific version of BOINC, or some sort of library I don’t already have? If I add/switch to this library, will it break the apt update process? |
Send message Joined: 23 Apr 12 Posts: 77 |
What doesIf you installed the kernel headers for your currently running kernel - $(uname -r) is the version string - this shows you how the kernel is configured regarding VSYSCALL. This is purely informational. In this example the vsyscall emulation is built in and enabled by default (VSYSCALL_EMULATE=y). For your kernel, it is built in and disabled by default (VSYSCALL_NONE=y). I suspect that's what the Seti application can't cope with, so you override it with the vsyscall=emulate boot parameter and then it's time for a test to see if we're on the right path.cat /usr/src/linux-headers-$(uname -r)/.config | grep VSYSCALLtell me? I have seen this suggested in a number of posts where they received a reply ofcat /usr/src/linux-headers-$(uname -r)/.config | grep VSYSCALL CONFIG_GENERIC_TIME_VSYSCALL=y CONFIG_X86_VSYSCALL_EMULATION=y CONFIG_LEGACY_VSYSCALL_EMULATE=y # CONFIG_LEGACY_VSYSCALL_NONE is not set |
Send message Joined: 25 Nov 05 Posts: 1654 ![]() |
Radjin If you're going to try models from cpdn, you need to be aware that they are 32 bit, and sometimes needed libraries aren't installed by default. This is the usual culprit: libstdc++.so.6 If it's not there, the models will crash at about 6 seconds. |
Send message Joined: 17 Sep 19 Posts: 12 ![]() |
What doesIf you installed the kernel headers for your currently running kernel - $(uname -r) is the version string - this shows you how the kernel is configured regarding VSYSCALL. This is purely informational. In this example the vsyscall emulation is built in and enabled by default (VSYSCALL_EMULATE=y). For your kernel, it is built in and disabled by default (VSYSCALL_NONE=y). I suspect that's what the Seti application can't cope with, so you override it with the vsyscall=emulate boot parameter and then it's time for a test to see if we're on the right path.cat /usr/src/linux-headers-$(uname -r)/.config | grep VSYSCALLtell me? I have seen this suggested in a number of posts where they received a reply ofcat /usr/src/linux-headers-$(uname -r)/.config | grep VSYSCALL CONFIG_GENERIC_TIME_VSYSCALL=y CONFIG_X86_VSYSCALL_EMULATION=y CONFIG_LEGACY_VSYSCALL_EMULATE=y # CONFIG_LEGACY_VSYSCALL_NONE is not set Thanks. A prior post suggested there may be no way to check if the option was activated. When I run: cat /usr/src/linux-headers-$(uname -r)/.config | grepI get: cat: /usr/src/linux-headers-4.19.0-6-amd64/.config: No such file or directoryeven though I should have activated it in grub with: GRUB_CMDLINE_LINUX_DEFAULT="vsyscall=emulate"and: sudo update-grub Generating grub configuration file ... Found linux image: /boot/vmlinuz-4.19.0-6-amd64 Found initrd image: /boot/initrd.img-4.19.0-6-amd64 Found linux image: /boot/vmlinuz-4.19.0-5-amd64 Found initrd image: /boot/initrd.img-4.19.0-5-amd64 Found memtest86+ image: /boot/memtest86+.bin Found memtest86+ multiboot image: /boot/memtest86+_multiboot.bin done sudo reboot This is likely a moot point as some pretty knowledgeable people have told me the issue I am trying to resolve is likely not with the vsyscall at all. I am just trying all options in order of complexity given I am pretty much a noob learning as I go along. I can’t be certain I have activated the option if I don’t get the expected output when I use the cat command. I have always run Debian Linux via command line, always used apt, and never had to step into the realm of compiling or updating outside of apt. So every time I get a suggestion beyond that realm I spend hours reading what I am doing and what has happened to others who did it. I’m quite thankful there are others out there who can help even if we noobs irritate them. |
Send message Joined: 17 Sep 19 Posts: 12 ![]() |
Radjin Thanks for that piece and of information. I added CPDN just to test the vsyscall on something other than seti@home; I haven’t downloaded any work units as of yet; hearing there was some bug with creating them for Linux and there may be some ready this coming week. Nothing just works... |
Send message Joined: 23 Apr 12 Posts: 77 |
A prior post suggested there may be no way to check if the option was activated. When I run:Most likely you don't have the kernel headers installed. You don't need them. If you wish you can docat /usr/src/linux-headers-$(uname -r)/.config | grepI get:cat: /usr/src/linux-headers-4.19.0-6-amd64/.config: No such file or directoryeven though I should have activated it in grub with:GRUB_CMDLINE_LINUX_DEFAULT="vsyscall=emulate"and:sudo update-grub grep VSYSCALL /boot/config-$(uname -r)instead but both will only show you the default values compiled into the kernel, not the current state as you seem to think. This is likely a moot point as some pretty knowledgeable people have told me the issue I am trying to resolve is likely not with the vsyscall at all.They may be right. Another reason to verify my theory soon. I am just trying all options in order of complexityAnd this one - besides the expected effects matching what you see - is easily tested. One changed configuration line, one update command, all you need to do now is run BOINC as usual and see if something has changed. Before all your SETI tasks segfaulted immediately. If now you see one run for some minutes you most likely have identified the issue. Also I see you have another computer running Debian 10 at SETI. Activate that, if the problem is caused by vsyscall it must show the same errors. |
Copyright © 2025 University of California.
Permission is granted to copy, distribute and/or modify this document
under the terms of the GNU Free Documentation License,
Version 1.2 or any later version published by the Free Software Foundation.