Message boards : Questions and problems : Boinc starting 6.10.17? unlinks binaries after an unknown period.
Message board moderation
Author | Message |
---|---|
![]() Send message Joined: 27 Mar 07 Posts: 11 ![]() |
This is staring to get frustrating. I have been running BOINC on multiple Linux boxes for years. They have been, for the most part, CentOS/RedHat, but, sometimes Fedora or UBUNTU. I use a single home directory for my logins, mounted on an NFS v4 server, and, in order to maintain a unique instance of BOINC for each one, I push the BOINC installs one layer deeper on the tree. i.e. instead of /home/user/BOINC, it's installed as /home/user/_%SYSTEM_NAME%_/BOINC where %SYSTEM_NAME% is the name of the individial Linux machine that it is on. Now, here is the problem. After a period of time, not short, but, days to weeks, the Linux system or BOINC will unlink the binaries in the BOINC directory. The inodes will still be there, BOINC will stay running, but, I can no longer start a boincmgr session. It also removes the RPC key for the GUI, so, even if I reinstall a binary copy from another system, I can't use it. I have tried moving the BOINC instance off of the NFS home directory, loading it instead in the /tmp/user/BOINC directory and using a symbolic link back to the install directory. This does not help. The binaries still get unlinked. Now, this would not be a problem if I could just stop the running instance, re-install BOINC and restart, but, it screws up my client information and forces a 'detach' from all my projects. I have re-attached a dozen times trying to find the cause of this issue. I do not want to go to using all physical devices to run BOINC as it is really difficult to do in a cluster type environment, which, this is... So, what could possibly be the reason that the unlink primative could be used on the running binary??... It's like BOINC is trying to deinstall itself or something like that. To forestall the inevitable questions:: It happens with many different kernel versions and all various versions and installs of the Linux being used. I don't think the OS is the problem. I do Linux support for a rather large company, and have been for doing support of varying platforms for 30 years. It does not mean that I am not capable of a brain cramp, but, it's not likely. It also occurs regardless of the status of SELINUX... enforcing or not, enabled or disabled... happens both ways... Some systems are real Intel and AMD based systems, others are virtual machines on a big OPTERON/VMware ESX system. All the ESX based machines are AMD based. Some are 32 bit, but, most are 64. Some have GEFORCE GPU's installed. It happens with all of them. The only common thread to this is that I use a network mounted home directory in all instances. (And I need to do it that way). I am thinking of turning auditing on for the UNLINK function in the library so that I can discover when this is happening, but, I think knowing when, will be of limited value in this case. The network is all Cisco and Foundry Enterprise gear (all Gigabit copper (CAT6) and fiber), so, I doubt that the networking has much to do with it. The NFS server is a multi drive RAID6/HS system (OpenFiler with ARECA SATA controllers 40+ drives), and it has been rock stable for years. Only downtime has been for upgrades. I also tried the storage on a NetWork Appliance filer. No difference. In many ways, the install here is very similar to the install a BOINC server would use, if it's not identical to it. There are probably as many individual installs as there are projects, so, small variances don't usually incur problems like this. Now, I have seen some problems that could be vaguely related to this. I think this because NFS is particularly notorious for file-locking issues. Now, I don't really know which mechanism is being used for locking in the BOINC client itself. (FLOCK, LOCK, LOCKF, etc), and moreover, I don't know that this is a file-locking problem, but, it acts like one. Does anyone have any cogent thoughts for this? I am at a loss of where to look next. Looking for a team ??? Join BoincSynergy!! ![]() ![]() |
![]() Send message Joined: 27 Mar 07 Posts: 11 ![]() |
BTW... This started in 6.10.x and is present in 6.10.44 .... I did not want it thought that it was only 6.10.17 like the subject line. It does not seem to be present as far back as 6.5.x, but, I don't run a lot of that these days.... Looking for a team ??? Join BoincSynergy!! ![]() ![]() |
![]() Send message Joined: 27 Mar 07 Posts: 11 ![]() |
Well.... I guess no one uses boinc on clusters or with NFS based home directories... So.... Looking for a team ??? Join BoincSynergy!! ![]() ![]() |
Copyright © 2025 University of California.
Permission is granted to copy, distribute and/or modify this document
under the terms of the GNU Free Documentation License,
Version 1.2 or any later version published by the Free Software Foundation.