Message boards : Questions and problems : World Community Grid Tasks Crash my Machine
Message board moderation
Author | Message |
---|---|
Send message Joined: 6 Feb 10 Posts: 13 ![]() |
World Community Grid keeps crashing my machine. The machine hangs and won't return. Have to reboot. This is a new problem, meaning recent. I had to suspend a WCD task last week for the same issue. I've checked the ActiveX, etc. All are good. The current project is Help Cure MD Phase 2 6.14. I haven't had any problems before and Rosetta@Home works fine. I believe this is the task that was failing: CMD2_0315-MYH14.clustersOccur-2Z5X_A.clustersOccur_224_300775_301137_1_0 I've suspended WCD and Boinc is working fine again with Rosetta@Home. If you let me know what log I should send I will. Thanks |
Send message Joined: 6 Feb 10 Posts: 13 ![]() |
Here's the output, but I regen-ed it b/c it went back to August. I have plenty of mem and disk space and I'm running fine with Rosetta@Home. I've been using this same machine with Rosetta and WCG for at least a couple years. This is the third time in the last 2 weeks that a WCG task has crashed this machine. 06-Feb-2010 19:42:53 [---] Starting BOINC client version 6.10.18 for windows_intelx86 06-Feb-2010 19:42:53 [---] log flags: file_xfer, sched_ops, task 06-Feb-2010 19:42:53 [---] Libraries: libcurl/7.19.4 OpenSSL/0.9.8l zlib/1.2.3 06-Feb-2010 19:42:53 [---] Data directory: C:\Documents and Settings\All Users\Application Data\BOINC 06-Feb-2010 19:42:53 [---] Running under account Eric 06-Feb-2010 19:42:53 [---] Processor: 1 AuthenticAMD AMD Athlon(tm) XP 2800+ [x86 Family 6 Model 10 Stepping 0] 06-Feb-2010 19:42:53 [---] Processor: 512.00 KB cache 06-Feb-2010 19:42:53 [---] Processor features: fpu tsc sse 3dnow mmx 06-Feb-2010 19:42:53 [---] OS: Microsoft Windows XP: Home x86 Edition, Service Pack 3, (05.01.2600.00) 06-Feb-2010 19:42:53 [---] Memory: 2.00 GB physical, 4.35 GB virtual 06-Feb-2010 19:42:53 [---] Disk: 74.52 GB total, 38.77 GB free 06-Feb-2010 19:42:53 [---] Local time is UTC -5 hours 06-Feb-2010 19:42:53 [---] No usable GPUs found 06-Feb-2010 19:42:53 [---] Not using a proxy 06-Feb-2010 19:42:54 [rosetta@home] URL http://boinc.bakerlab.org/rosetta/; Computer ID 372014; resource share 90 06-Feb-2010 19:42:54 [World Community Grid] URL http://www.worldcommunitygrid.org/; Computer ID 91106; resource share 90 06-Feb-2010 19:42:54 [rosetta@home] General prefs: from rosetta@home (last modified 13-Dec-2006 10:29:55) 06-Feb-2010 19:42:54 [rosetta@home] Host location: none 06-Feb-2010 19:42:54 [rosetta@home] General prefs: using your defaults 06-Feb-2010 19:42:54 [---] Reading preferences override file 06-Feb-2010 19:42:54 [---] Preferences limit memory usage when active to 1023.74MB 06-Feb-2010 19:42:54 [---] Preferences limit memory usage when idle to 1842.74MB 06-Feb-2010 19:42:54 [---] Preferences limit disk usage to 25.00GB BOINC initialization completed, beginning process execution... 06-Feb-2010 19:42:57 [rosetta@home] Restarting task 2cgq_Jan25_2cgq_1ise_26Jan2010_17412_49_0 using minirosetta version 205 |
Send message Joined: 6 Feb 10 Posts: 13 ![]() |
I will get you the output when it hangs again. I turned off WCG for now... What file is the start up message log of the client (about first 30 lines)? |
![]() Send message Joined: 20 Dec 07 Posts: 1069 ![]() |
You already found it. What you posted is what Sekerob asked for. What is missing are the lines around a hang of your machine. You don't need to wait for a new one if you know the (approx.) time of the last hang. Gruß, Gundolf Computer sind nicht alles im Leben. (Kleiner Scherz) ![]() |
Send message Joined: 6 Feb 10 Posts: 13 ![]() |
I'm not sure if this has any informative data. The computer hung around 20:35. Doesn't take any input from the keyboard or mouse. I have to do a hard reboot (on/off switch). The system started up around 20:47 and that's when I suspended WCG again and switched back to Rosetta. 06-Feb-2010 19:55:34 [World Community Grid] resumed by user 06-Feb-2010 19:55:35 [World Community Grid] Sending scheduler request: Requested by project. 06-Feb-2010 19:55:35 [World Community Grid] Not reporting or requesting tasks 06-Feb-2010 19:55:41 [World Community Grid] Scheduler request completed 06-Feb-2010 19:55:43 [World Community Grid] task CMD2_0315-MYH14.clustersOccur-2Z5X_A.clustersOccur_224_300775_301137_1 resumed by user 06-Feb-2010 19:56:02 [rosetta@home] update requested by user 06-Feb-2010 19:56:06 [rosetta@home] Sending scheduler request: Requested by user. 06-Feb-2010 19:56:06 [rosetta@home] Reporting 1 completed tasks, not requesting new tasks 06-Feb-2010 19:56:12 [rosetta@home] Scheduler request completed 06-Feb-2010 19:56:46 [---] Resuming computation 06-Feb-2010 19:57:15 [---] Suspending computation - user request 06-Feb-2010 19:57:52 [---] Resuming computation 06-Feb-2010 20:26:14 [rosetta@home] task 2cgq_Jan25_2cgq_1ise_26Jan2010_17412_49_0 suspended by user 06-Feb-2010 20:26:15 [World Community Grid] Restarting task CMD2_0315-MYH14.clustersOccur-2Z5X_A.clustersOccur_224_300775_301137_1 using hcmd2 version 614 06-Feb-2010 20:47:30 [---] Starting BOINC client version 6.10.18 for windows_intelx86 06-Feb-2010 20:47:30 [---] log flags: file_xfer, sched_ops, task 06-Feb-2010 20:47:30 [---] Libraries: libcurl/7.19.4 OpenSSL/0.9.8l zlib/1.2.3 06-Feb-2010 20:47:30 [---] Data directory: C:\Documents and Settings\All Users\Application Data\BOINC 06-Feb-2010 20:47:30 [---] Running under account Eric 06-Feb-2010 20:47:30 [---] Processor: 1 AuthenticAMD AMD Athlon(tm) XP 2800+ [x86 Family 6 Model 10 Stepping 0] 06-Feb-2010 20:47:30 [---] Processor: 512.00 KB cache 06-Feb-2010 20:47:30 [---] Processor features: fpu tsc sse 3dnow mmx 06-Feb-2010 20:47:30 [---] OS: Microsoft Windows XP: Home x86 Edition, Service Pack 3, (05.01.2600.00) 06-Feb-2010 20:47:30 [---] Memory: 2.00 GB physical, 4.35 GB virtual 06-Feb-2010 20:47:30 [---] Disk: 74.52 GB total, 38.78 GB free 06-Feb-2010 20:47:30 [---] Local time is UTC -5 hours 06-Feb-2010 20:47:30 [---] No usable GPUs found 06-Feb-2010 20:47:31 [---] Not using a proxy 06-Feb-2010 20:47:31 [rosetta@home] URL http://boinc.bakerlab.org/rosetta/; Computer ID 372014; resource share 90 06-Feb-2010 20:47:31 [World Community Grid] URL http://www.worldcommunitygrid.org/; Computer ID 91106; resource share 90 06-Feb-2010 20:47:32 [rosetta@home] General prefs: from rosetta@home (last modified 13-Dec-2006 10:29:55) 06-Feb-2010 20:47:32 [rosetta@home] Host location: none 06-Feb-2010 20:47:32 [rosetta@home] General prefs: using your defaults 06-Feb-2010 20:47:32 [---] Reading preferences override file 06-Feb-2010 20:47:32 [---] Preferences limit memory usage when active to 1023.74MB 06-Feb-2010 20:47:32 [---] Preferences limit memory usage when idle to 1842.74MB 06-Feb-2010 20:47:42 [---] Preferences limit disk usage to 25.00GB BOINC initialization completed, beginning process execution... 06-Feb-2010 20:47:46 [World Community Grid] Restarting task CMD2_0315-MYH14.clustersOccur-2Z5X_A.clustersOccur_224_300775_301137_1 using hcmd2 version 614 06-Feb-2010 20:47:55 [---] Suspending computation - user request 06-Feb-2010 20:48:08 [World Community Grid] suspended by user 06-Feb-2010 20:48:14 [rosetta@home] task 2cgq_Jan25_2cgq_1ise_26Jan2010_17412_49_0 resumed by user 06-Feb-2010 20:48:19 [World Community Grid] task CMD2_0315-MYH14.clustersOccur-2Z5X_A.clustersOccur_224_300775_301137_1 suspended by user |
Send message Joined: 6 Feb 10 Posts: 13 ![]() |
Sekerob, I thought the SS was the problem. yes, I do run in SS, but Rosetta@Home runs OK and WCG worked fine before. I did what you recommended. I checked my nVidia drivers and I have the latest. I disabled SS and ran WCG - Boinc 6.10.18 and it ran fine. Went into SS mode and came out w/o any problem. I downloaded the 6.10.32 and WCG went in boinc SS and out of it fine. BUT... When I was typing this reply the first time the system crashed. No input... Couldn't do CTRL-ALT-DEL to shutdown or even see what process was taking 100% of the cpu. Had to do a hard reboot again. This was while running the WCG task in this thread. I've started back up and Rosetta@Home doesn't crash the system. So, I thought we had an answer with the SS, but it seems that's not the problem. |
Send message Joined: 6 Feb 10 Posts: 13 ![]() |
Sekerob, This is interesting. At the time of the crash I noticed this in the dae.txt log: 07-Feb-2010 10:58:18 [rosetta@home] suspended by user 07-Feb-2010 10:58:19 [World Community Grid] Restarting task CMD2_0315-MYH14.clustersOccur-2Z5X_A.clustersOccur_224_300775_301137_1 using hcmd2 version 614 07-Feb-2010 10:58:29 [---] Suspending computation - CPU usage is too high 07-Feb-2010 10:58:39 [---] Resuming computation Then I looked in the std err.txt log and I found this: GLE: Another instance of BOINC is running. GLE: Another instanc Another instance of BOINC is running. GLE: Another instance of BOINC is running. GLE: Another instanc Another instance of BOINC is running. GLE: Another instance of BOINC is running. GLE: Another instanc When it doesn't finish the word instance I wonder if this is where the crashes are occurring??? |
Send message Joined: 6 Feb 10 Posts: 13 ![]() |
I checked programs in control panel and there's only one BOINC. I did notice that there are 2 BOINC screensaver entries. This is kind of weird since I follow the standard BOINC install process. You would think it would've deleted the previous version? I've reset WCG and downloaded my first RICE task. I'll let you know how it goes. |
Send message Joined: 6 Feb 10 Posts: 13 ![]() |
Update. Some good news... It looks like after the last hard reboot there was no longer two BOINC entries in the screen save drop down. I ran a couple Rice tasks to success. Then I decided to go back to Help Cure MD Phase 2. HCMD2 seems to be running ok now. I've finished 1 task and almost done with another. Both Rice and HCMD2 were working while we were doing other things on the PC. No hangs or freeze up. Either that one HCMD2 task was a problem or the 2 instances of the Boinc screen saver were causing the issue. I'll go with the latter. |
Send message Joined: 6 Feb 10 Posts: 13 ![]() |
Sekerob, Spoke to soon... I've had 2 crashes HCMD Phase 2 since my last post. I'm aborting HCMD and deselecting it in my projects on WCG. If you want the task I was running I can post it. |
Send message Joined: 6 Feb 10 Posts: 13 ![]() |
Sekerob, I found these errors from Friday: 2007-02-08 13:49:12 [rosetta@home] Unrecoverable error for result 1wit__BOINC_ABINITIO_TRIM2__1546_685_0 ( - exit code -1073741819 (0xc0000005)) 2007-02-08 15:47:25 [rosetta@home] Unrecoverable error for result 1c9oA_BOINC_ABINITIO_TRIM2__1546_904_0 ( - exit code -1073741674 (0xc0000096)) I'm also seeing this in the error log: Another instance of BOINC is running. GLE: Another instance of BOINC is running. GLE: Another instanc The boinc screen saver is not running the application screen saver and there's only one instance. I'm going to change to the generic windows screen saver and see if that helps. these are what's running boinc.exe boincmgr.exe boinctray.exe I'm going to restart and try again. |
Send message Joined: 6 Feb 10 Posts: 13 ![]() |
Here's my boinc SS settings: I run plain Boinc SS after 10 minutes. After 30 minutes I go to blank screen. I also turn the monitor off after 45 minutes. Not sure if these would affect it? It didn't before... |
Send message Joined: 6 Feb 10 Posts: 13 ![]() |
I decided to run for awhile to rule out the power save on the screen saver after 45 mins. I turned it off and it seemed to run ok for a day or so... Had another crash this afternoon. This is the task that was running: 12-Feb-2010 18:44:05 [World Community Grid] Restarting task CMD2_0339-1MBM_D.clustersOccur-1RLY_A.clustersOccur_0_0 using hcmd2 version 614 I also see this is the std error file: Another instance of BOINC is running. GLE: Another instance of BOINC is running. GLE: Another instanc Unfortunately it doesn't give me a time to know if the last statement "GLE: Another instanc" is a failure. One would think it is since it didn't finish writing the line GLE: Another instance of BOINC is running. |
Send message Joined: 6 Feb 10 Posts: 13 ![]() |
Hi Sekerob, I would agree with you regarding the hw/sw but I've checked the drivers and this node is running Rosetta@Home and other WCG tasks like Nutritious Rice w/o any problems. All hw drivers are up to date. Since I've been running Rosetta@Home after suspending HCMD I haven't had any of those errors to stderrdae.txt. There's nothing in the file, and not this: Another instance of BOINC is running. GLE: Another instance of BOINC is running. GLE: Another instanc If it were a hw/sw issue I would be having problems with Rosetta@Home and Nutritious Rice. Since we can't find the issue, I'm going to deselect HCMD2 on WCG and press on. Thanks. |
Copyright © 2025 University of California.
Permission is granted to copy, distribute and/or modify this document
under the terms of the GNU Free Documentation License,
Version 1.2 or any later version published by the Free Software Foundation.