Message boards : Questions and problems : All GPU boinc projects return computation error?
Message board moderation
Author | Message |
---|---|
Send message Joined: 8 Nov 19 Posts: 718 ![]() |
My pc runs Linux (Lubuntu), and has been working flawlessly for the past few weeks. From a working condition, I closed Boinc, turned off the PC, and a few days later restart boinc, and all my projects return a computation error. Nvidia drivers are found, everything was exactly the same as before. I ended up needing to reinstall Boinc, for the issue to be fixed on SOME projects. Asteroids, Einstein, all worked fine before. Now they just error out :( Meanwhile, new downloads of Collatz and Milkyway seem to work fine... What could be the cause of this? |
![]() ![]() Send message Joined: 27 Jun 08 Posts: 642 ![]() |
My pc runs Linux (Lubuntu), and has been working flawlessly for the past few weeks. It is up to the project to implement a restart mechanism. Some projects have a robust method and others, to put it nicely, do not. Some projects ("like "A") take longer to write checkpoint (recovery files) than others. If you "close the lid" on your laptop or tell the OS to shutdown there is a good chance that Project "B" will not get to write its checkpoints and when you restart, some of "A" will have errored out as well as all of "B" GPUgrid: If you have two GPUs and they are different, there is a %50 chance that GPU0 will use GPU1's checkpoint and GPU1 will use GPU0's. This causes both work units to report compilation errors. If have 3 different GPUs there is far less than %33 chance. Depending on which system I need to power down I do the following: Issue command for NO NEW TASKS Suspend all work units that have not started wait for all GPU tasks to finish I have not had a problem with CPU bound tasks like WCG but you may want to suspend CPU tasks fi a problem Exit the gridcoin "research" program (this is a must as they have a really terrible handler for sigterm or win shutdown )
The only time a re-install of BOINC is needed is if there is a disk drive problem and boinc does not start. On rare occasions (gpugrid comes to mind) there is a bug in the project startup like a null account or maybe a null (empty) reply file caused by power going off when the file was written. Very likely all subsequent work units will error out. Just reset the project instead of reinstalling boinc. |
Send message Joined: 8 Nov 19 Posts: 718 ![]() |
My pc runs Linux (Lubuntu), and has been working flawlessly for the past few weeks. Thank you for the explanation. I can understand that the last project errors out, but all tasks (like 20 of them) all had compute errors. I'm not sure if newly downloaded tasks had the same error or not, I'll have to take a look at it now. |
Copyright © 2025 University of California.
Permission is granted to copy, distribute and/or modify this document
under the terms of the GNU Free Documentation License,
Version 1.2 or any later version published by the Free Software Foundation.