Message boards : Questions and problems : Projects work units wasting large amounts of time.
Message board moderation
Author | Message |
---|---|
![]() ![]() Send message Joined: 2 Oct 05 Posts: 415 ![]() |
If a projects work units normally run to completion in just over an hour, but odd units suddenly start running for enormous amounts of time and private messages to the admin are unanswered, is there another way of contacting them? Wave upon wave of demented avengers march cheerfully out of obscurity into the dream. |
Send message Joined: 25 May 09 Posts: 1326 ![]() |
Without knowing which project is concerned.... The content of tasks being delivered to you my have changed The project administrator could be occupied with other work He may be one that doesn't read his private messages very often Are the tasks actually taking a very long time to complete, or is it that the estimated time to completion is wrong. Some projects have a "back door" to the project administrator, but without knowing which project you are talking about nobody can point you in the right direction. And a whole load more. |
![]() ![]() Send message Joined: 2 Oct 05 Posts: 415 ![]() |
The work units normally run from 0 to 100% in just over an hour on this machine, (4GHz i7 no overclock), then upload. My results page shows values from 4600 to 6401 seconds for example from jobs returned today. The one I noticed, and have suspended has elapsed 17:38:40 and shows 100% complete, remaining 00:00:00. Resuming it, the elapsed increases, but that is it. There are other crunchers in a thread on the message board that are reporting similar stories. Looking at the status page for the work unit I have suspended here, it has a download error, which I discount, and two "Timed out - no reply" jobs both sent 6 Nov. The implication is that these just ran on until they reached the cutoff point and were posted as no reply, they may still be running!. I checked my CPU usage, and when the job is running, I show all cores and threads running at 100%, so the implication is that the job has entered a loop from which it has no way out at termination. I've not aborted the wu, as it will simply go to someone else and run them for hours. I've sent a private message, but have no way of seeing if that has been read by Radim. The project is Asteroids. Wave upon wave of demented avengers march cheerfully out of obscurity into the dream. |
![]() ![]() Send message Joined: 2 Oct 05 Posts: 415 ![]() |
The forum thread is here... http://asteroidsathome.net/boinc/forum_thread.php?id=715 Wave upon wave of demented avengers march cheerfully out of obscurity into the dream. |
Send message Joined: 25 May 09 Posts: 1326 ![]() |
Two possible actions. First "reset" the project concerned, this can trigger problematic tasks into finishing properly, albeit with a very long run time. Second "abort" the task. It may be that it is a problematic one, or it might just be stalled on your PC, either way round the project will send it out to another user for whom it will either end normally or fail. Many project administrators only work on the project a few hours a week, so it may take them a long time to actually get to be in a position where they can respond. |
![]() ![]() Send message Joined: 2 Oct 05 Posts: 415 ![]() |
What does resetting actually do? max # of error/total/success tasks 7, 20, 20 This could affect a good number of people and waste a large number of hours before it dies by itself. >>> Many project administrators only work on the project a few hours a week, so it may take them a long time to actually get to be in a position where they can respond. <<< Hence my original question. Wave upon wave of demented avengers march cheerfully out of obscurity into the dream. |
Send message Joined: 25 May 09 Posts: 1326 ![]() |
Sadly, until it dies by itself the majority of project teams will not interfere with a task that is running, or trying to run, just in case it does eventually run properly to completion. Resetting a project clears various pointers and internal (invisible) caches used by the applications thus potentially helping them to run more smoothly. If the project administrator is unreachable for whatever reason then alternative paths are less than likely to work. Looking at the Asteroids forum there are a number of people having the same problem, indeed you have raised the question there. Looking at the various threads over there the "best advise" is to abandon any stuck tasks and hope that the administrator does realise there is a batch of damaged tasks around. Edit- It looks as if someone knows about this problem and is trying to resolve it: http://asteroidsathome.net/boinc/forum_thread.php?id=713&postid=6037#6037 |
![]() ![]() Send message Joined: 2 Oct 05 Posts: 415 ![]() |
Surely, aborting the job simply makes it available to someone else, who may waste days of CPU time on it. And then the next person, and the next..... And resetting the project has killed the damn job and sent it back, exactly what I didn't want to do. Wave upon wave of demented avengers march cheerfully out of obscurity into the dream. |
![]() Send message Joined: 28 Jun 10 Posts: 2842 ![]() |
Surely, aborting the job simply makes it available to someone else, who may waste days of CPU time on it. And then the next person, and the next..... Projects tend to set a maximum number of times a task will be resent. For CPDN this is normally three. a _0, _1 or _2 indicate whether a task is a retread or not. For some projects such as CPDN many tasks fail due to computer problems rather than a problem with the task hence the reason for three or whatever goes at it. |
![]() ![]() Send message Joined: 2 Oct 05 Posts: 415 ![]() |
Yes, I know, but his is set... max # of error/total/success tasks 7, 20, 20 ... and the problem causes tasks to run for ages, and they KNOW the problem exists. He is wasting many hours of crunchers CPU time. I have dropped the project from my machines. This is totally unacceptable. It should be black listed for at least a month. Wave upon wave of demented avengers march cheerfully out of obscurity into the dream. |
![]() Send message Joined: 28 Jun 10 Posts: 2842 ![]() |
Yes, I know, but his is set... Just as on CPDN where most of my crunching is, 3 seems a bit on the low side sometimes, especially for Linux tasks where there are lots of crunchers who don't know how to ensure they have all the requisite 32bit libs and so crash everything, that seems to me way over the top. |
Copyright © 2025 University of California.
Permission is granted to copy, distribute and/or modify this document
under the terms of the GNU Free Documentation License,
Version 1.2 or any later version published by the Free Software Foundation.