Message boards : Questions and problems : Why am being forced into two consecutive 24 delays with no work returned yet
Message board moderation
Author | Message |
---|---|
![]() ![]() Send message Joined: 17 Nov 16 Posts: 906 ![]() |
Why am being forced into two consecutive 24 delays with no work returned yet. I am trying to deploy a new app and after returning all the sent work as errors, I was forced into a 24 hour delay. However after the first 24 hour delay expired and I requested work again, I was forced into another 24 hour delay. WHY? I just got the same "reached daily quota of 11 tasks" This is the host. https://einsteinathome.org/host/12775352 |
Send message Joined: 5 Oct 06 Posts: 5149 ![]() |
Einstein says you've exceeded your daily quota: 2019-04-29 01:11:56.3583 [PID=29900] [send] stopping work search - daily quota exceeded (24>=11) 2019-04-29 01:11:56.3622 [PID=29900] Sending reply to [HOST#12775352]: 0 results, delay req 85052.00It's "until tomorrow", not quite the full 24 Your quota is low because you've errored every one of the 43 attempts so far: try to use the remaining time to fix the error, then you can start increasing the quota each time you return a successful completion. Edit - you're missing a library: ../../projects/einstein.phys.uwm.edu/einsteinbinary_cuda64: error while loading shared libraries: libcufft.so.8.0: cannot open shared object file: No such file or directory |
![]() ![]() Send message Joined: 17 Nov 16 Posts: 906 ![]() |
I think I fixed the issue of the missing libraries a couple of days ago, by using trick of putting the libcufft and libcudart libraries directly in the project directory like we did with the CUDA80 Seti MB app. None of the normal and supposedly proper ways to link the system's stock libcudart and libcufft libraries to those required by the app worked. Neither did exporting the LDLibrary path. But without any work to test with I still don't know if putting the files directly in the same directory as the application will work. The app is working for the developer, I just haven't figured out why it won't work on my system yet. I didn't realize that the total of failed work units was what was applied to the daily quote limit. Question. What could I possibly do if I had downloaded 500 tasks the first time and instantly errored them out? Would I have had to wait 45 days before getting work for the project again? I only have my normal 0.5 days of work cache for the host. Same as all my hosts. [Edit] I just reduced the host's venue down to 0.1 day of cache. Hope that only retrieves 1 or 2 tasks in case they fail again. |
Send message Joined: 5 Oct 06 Posts: 5149 ![]() |
By default, the daily quota reduces to a minimum of one task per day - that gives you an escape route. I imagine the same applies to Einstein, although they have heavily customised their code. Is this the machine you are running under Anonymous Platform (app_info.xml)? If so, ensure that any extra file you put in the project directory is properly declared and referenced - otherwise it may not be available when the app is actually run from the slot directory. |
![]() ![]() Send message Joined: 17 Nov 16 Posts: 906 ![]() |
And that is a big thank you. I had forgotten to do that. I am updating the app_info right now to add those file references to the existing ones. At least I did this before the next attempt at running the app when I can get new work again. |
Send message Joined: 5 Oct 06 Posts: 5149 ![]() |
When I was preparing the AIstub files under Windows, I used SysInternals' Dependency Walker to see what libraries were needed: I don't know the name of the equivalent tool under Linux, but there must be one. The results of not checking can still be seen at SETI Beta message 39386 |
![]() ![]() Send message Joined: 17 Nov 16 Posts: 906 ![]() |
Help Richard! Maybe you can tell me why I am still unable to run any gpu tasks. I can start them but they error out with a disk limit exceeded. Even though I increased the Home venue disk usage twice now. The amount needed didn't change. keith@Nano:~/boinc$ ./boinc 29-Apr-2019 19:51:32 [---] Starting BOINC client version 7.9.3 for aarch64-unknown-linux-gnu 29-Apr-2019 19:51:32 [---] log flags: file_xfer, sched_ops, task, cpu_sched, sched_op_debug 29-Apr-2019 19:51:32 [---] Libraries: libcurl/7.58.0 OpenSSL/1.1.0g zlib/1.2.11 libidn2/2.0.4 libpsl/0.19.1 (+libidn2/2.0.4) nghttp2/1.30.0 librtmp/2.3 29-Apr-2019 19:51:32 [---] Data directory: /home/keith/boinc 29-Apr-2019 19:51:32 [---] CUDA: NVIDIA GPU 0: NVIDIA Tegra X1 (driver version unknown, CUDA version 10.0, compute capability 5.3, 3957MB, 2179MB available, 236 GFLOPS peak) 29-Apr-2019 19:51:32 [Einstein@Home] Found app_info.xml; using anonymous platform 29-Apr-2019 19:51:33 [---] [libc detection] gathered: 2.27, Ubuntu GLIBC 2.27-3ubuntu1 29-Apr-2019 19:51:33 [---] Host name: Nano 29-Apr-2019 19:51:33 [---] Processor: 4 ARM ARMv8 Processor rev 1 (v8l) [Impl 0x41 Arch 8 Variant 0x1 Part 0xd07 Rev 1] 29-Apr-2019 19:51:33 [---] Processor features: fp asimd evtstrm aes pmull sha1 sha2 crc32 29-Apr-2019 19:51:33 [---] OS: Linux Ubuntu: Ubuntu 18.04.2 LTS [4.9.140-tegra|libc 2.27 (Ubuntu GLIBC 2.27-3ubuntu1)] 29-Apr-2019 19:51:33 [---] Memory: 3.86 GB physical, 0 bytes virtual 29-Apr-2019 19:51:33 [---] Disk: 29.21 GB total, 17.65 GB free 29-Apr-2019 19:51:33 [---] Local time is UTC -7 hours 29-Apr-2019 19:51:33 [---] Config: GUI RPC allowed from any host 29-Apr-2019 19:51:33 [---] Config: GUI RPCs allowed from: 29-Apr-2019 19:51:33 [---] 192.168.2.34 29-Apr-2019 19:51:33 [---] Config: report completed tasks immediately 29-Apr-2019 19:51:33 [Einstein@Home] URL http://einstein.phys.uwm.edu/; Computer ID 12775352; resource share 25 29-Apr-2019 19:51:33 [SETI@home] URL http://setiathome.berkeley.edu/; Computer ID 8707387; resource share 75 29-Apr-2019 19:51:33 [Einstein@Home] General prefs: from Einstein@Home (last modified ---) 29-Apr-2019 19:51:33 [Einstein@Home] Computer location: home 29-Apr-2019 19:51:33 [---] General prefs: using separate prefs for home 29-Apr-2019 19:51:33 [---] Reading preferences override file 29-Apr-2019 19:51:33 [---] Preferences: 29-Apr-2019 19:51:33 [---] max memory usage when active: 1978.28 MB 29-Apr-2019 19:51:33 [---] max memory usage when idle: 3560.91 MB 29-Apr-2019 19:51:33 [---] max disk usage: 10.00 GB 29-Apr-2019 19:51:33 [---] max CPUs used: 2 29-Apr-2019 19:51:33 [---] suspend work if non-BOINC CPU load exceeds 25% 29-Apr-2019 19:51:33 [---] (to change preferences, visit a project web site or select Preferences in the Manager) 29-Apr-2019 19:51:33 [---] Setting up project and slot directories 29-Apr-2019 19:51:33 [---] Checking active tasks 29-Apr-2019 19:51:33 [---] Setting up GUI RPC socket 29-Apr-2019 19:51:33 [---] gui_rpc_auth.cfg is empty - no GUI RPC password protection 29-Apr-2019 19:51:33 [---] Checking presence of 63 project files 29-Apr-2019 19:51:33 Initialization completed 29-Apr-2019 19:51:33 [Einstein@Home] [sched_op] Starting scheduler request 29-Apr-2019 19:51:33 [Einstein@Home] Sending scheduler request: To report completed tasks. 29-Apr-2019 19:51:33 [Einstein@Home] Reporting 5 completed tasks 29-Apr-2019 19:51:33 [Einstein@Home] Requesting new tasks for NVIDIA GPU 29-Apr-2019 19:51:33 [Einstein@Home] [sched_op] CPU work request: 0.00 seconds; 0.00 devices 29-Apr-2019 19:51:33 [Einstein@Home] [sched_op] NVIDIA GPU work request: 9504.00 seconds; 1.00 devices 29-Apr-2019 19:51:37 [Einstein@Home] Scheduler request completed: got 6 new tasks 29-Apr-2019 19:51:37 [Einstein@Home] [sched_op] Server version 611 29-Apr-2019 19:51:37 [Einstein@Home] Project requested delay of 60 seconds 29-Apr-2019 19:51:37 [Einstein@Home] New computer location: home 29-Apr-2019 19:51:37 [Einstein@Home] General prefs: from Einstein@Home (last modified ---) 29-Apr-2019 19:51:37 [Einstein@Home] Computer location: home 29-Apr-2019 19:51:37 [---] General prefs: using separate prefs for home 29-Apr-2019 19:51:37 [---] Reading preferences override file 29-Apr-2019 19:51:37 [---] Preferences: 29-Apr-2019 19:51:37 [---] max memory usage when active: 1978.28 MB 29-Apr-2019 19:51:37 [---] max memory usage when idle: 3560.91 MB 29-Apr-2019 19:51:37 [---] max disk usage: 10.00 GB 29-Apr-2019 19:51:37 [---] Number of usable CPUs has changed from 2 to 3. 29-Apr-2019 19:51:37 [---] max CPUs used: 3 29-Apr-2019 19:51:37 [---] suspend work if non-BOINC CPU load exceeds 25% 29-Apr-2019 19:51:37 [---] (to change preferences, visit a project web site or select Preferences in the Manager) 29-Apr-2019 19:51:37 [Einstein@Home] [sched_op] estimated total CPU task duration: 0 seconds 29-Apr-2019 19:51:37 [Einstein@Home] [sched_op] estimated total NVIDIA GPU task duration: 7473 seconds 29-Apr-2019 19:51:37 [Einstein@Home] [sched_op] handle_scheduler_reply(): got ack for task p2030.20170414.G44.61-02.33.N.b6s0g0.00000_1388_0 29-Apr-2019 19:51:37 [Einstein@Home] [sched_op] handle_scheduler_reply(): got ack for task p2030.20170414.G44.61-02.33.N.b6s0g0.00000_1389_0 29-Apr-2019 19:51:37 [Einstein@Home] [sched_op] handle_scheduler_reply(): got ack for task p2030.20170414.G44.61-02.33.N.b6s0g0.00000_1387_1 29-Apr-2019 19:51:37 [Einstein@Home] [sched_op] handle_scheduler_reply(): got ack for task p2030.20170414.G44.61-02.33.N.b6s0g0.00000_1391_0 29-Apr-2019 19:51:37 [Einstein@Home] [sched_op] handle_scheduler_reply(): got ack for task p2030.20170414.G44.61-02.33.N.b6s0g0.00000_1390_0 29-Apr-2019 19:51:37 [Einstein@Home] [sched_op] Deferring communication for 00:01:00 29-Apr-2019 19:51:37 [Einstein@Home] [sched_op] Reason: requested by project 29-Apr-2019 19:51:39 [Einstein@Home] Started download of p2030.20170414.G44.77-01.19.C.b0s0g0.00000_1646.bin4 29-Apr-2019 19:51:39 [Einstein@Home] Started download of p2030.20170414.G44.77-01.19.C.b0s0g0.00000.zap 29-Apr-2019 19:51:42 [Einstein@Home] Finished download of p2030.20170414.G44.77-01.19.C.b0s0g0.00000.zap 29-Apr-2019 19:51:42 [Einstein@Home] Started download of p2030.20170414.G44.77-01.19.C.b0s0g0.00000_1647.bin4 29-Apr-2019 19:51:44 [Einstein@Home] Finished download of p2030.20170414.G44.77-01.19.C.b0s0g0.00000_1646.bin4 29-Apr-2019 19:51:44 [Einstein@Home] Started download of p2030.20170414.G44.61-02.33.N.b5s0g0.00000_99.bin4 29-Apr-2019 19:51:44 [Einstein@Home] Starting task p2030.20170414.G44.77-01.19.C.b0s0g0.00000_1646_1 29-Apr-2019 19:51:44 [Einstein@Home] [cpu_sched] Starting task p2030.20170414.G44.77-01.19.C.b0s0g0.00000_1646_1 using einsteinbinary_BRP4 version 999 in slot 3 29-Apr-2019 19:51:46 [Einstein@Home] Aborting task p2030.20170414.G44.77-01.19.C.b0s0g0.00000_1646_1: exceeded disk limit: 127.11MB > 19.07MB 29-Apr-2019 19:51:46 [Einstein@Home] [sched_op] Deferring communication for 00:01:28 29-Apr-2019 19:51:46 [Einstein@Home] [sched_op] Reason: Unrecoverable error for task p2030.20170414.G44.77-01.19.C.b0s0g0.00000_1646_1 29-Apr-2019 19:51:47 [Einstein@Home] Finished download of p2030.20170414.G44.61-02.33.N.b5s0g0.00000_99.bin4 29-Apr-2019 19:51:47 [Einstein@Home] Started download of p2030.20170414.G44.61-02.33.N.b5s0g0.00000.zap 29-Apr-2019 19:51:47 [Einstein@Home] Computation for task p2030.20170414.G44.77-01.19.C.b0s0g0.00000_1646_1 finished 29-Apr-2019 19:51:47 [Einstein@Home] Output file p2030.20170414.G44.77-01.19.C.b0s0g0.00000_1646_1_0 for task p2030.20170414.G44.77-01.19.C.b0s0g0.00000_1646_1 absent 29-Apr-2019 19:51:48 [Einstein@Home] Finished download of p2030.20170414.G44.61-02.33.N.b5s0g0.00000.zap 29-Apr-2019 19:51:48 [Einstein@Home] Started download of p2030.20170414.G44.61-02.33.N.b5s0g0.00000_100.bin4 29-Apr-2019 19:51:48 [Einstein@Home] Starting task p2030.20170414.G44.61-02.33.N.b5s0g0.00000_99_0 29-Apr-2019 19:51:48 [Einstein@Home] [cpu_sched] Starting task p2030.20170414.G44.61-02.33.N.b5s0g0.00000_99_0 using einsteinbinary_BRP4 version 999 in slot 3 29-Apr-2019 19:51:50 [Einstein@Home] Aborting task p2030.20170414.G44.61-02.33.N.b5s0g0.00000_99_0: exceeded disk limit: 127.11MB > 19.07MB 29-Apr-2019 19:51:50 [Einstein@Home] [sched_op] Deferring communication for 00:03:55 29-Apr-2019 19:51:50 [Einstein@Home] [sched_op] Reason: Unrecoverable error for task p2030.20170414.G44.61-02.33.N.b5s0g0.00000_99_0 29-Apr-2019 19:51:51 [Einstein@Home] Finished download of p2030.20170414.G44.77-01.19.C.b0s0g0.00000_1647.bin4 29-Apr-2019 19:51:51 [Einstein@Home] Started download of p2030.20170414.G44.61-02.33.N.b5s0g0.00000_101.bin4 29-Apr-2019 19:51:51 [Einstein@Home] Computation for task p2030.20170414.G44.61-02.33.N.b5s0g0.00000_99_0 finished 29-Apr-2019 19:51:51 [Einstein@Home] Output file p2030.20170414.G44.61-02.33.N.b5s0g0.00000_99_0_0 for task p2030.20170414.G44.61-02.33.N.b5s0g0.00000_99_0 absent 29-Apr-2019 19:51:51 [Einstein@Home] Starting task p2030.20170414.G44.77-01.19.C.b0s0g0.00000_1647_0 29-Apr-2019 19:51:51 [Einstein@Home] [cpu_sched] Starting task p2030.20170414.G44.77-01.19.C.b0s0g0.00000_1647_0 using einsteinbinary_BRP4 version 999 in slot 3 29-Apr-2019 19:51:52 [Einstein@Home] Aborting task p2030.20170414.G44.77-01.19.C.b0s0g0.00000_1647_0: exceeded disk limit: 127.11MB > 19.07MB 29-Apr-2019 19:51:52 [Einstein@Home] [sched_op] Deferring communication for 00:07:33 29-Apr-2019 19:51:52 [Einstein@Home] [sched_op] Reason: Unrecoverable error for task p2030.20170414.G44.77-01.19.C.b0s0g0.00000_1647_0 29-Apr-2019 19:51:52 [Einstein@Home] Finished download of p2030.20170414.G44.61-02.33.N.b5s0g0.00000_100.bin4 29-Apr-2019 19:51:52 [Einstein@Home] Started download of p2030.20170414.G44.61-02.33.N.b5s0g0.00000_102.bin4 29-Apr-2019 19:51:52 [Einstein@Home] Starting task p2030.20170414.G44.61-02.33.N.b5s0g0.00000_100_0 29-Apr-2019 19:51:52 [Einstein@Home] [cpu_sched] Starting task p2030.20170414.G44.61-02.33.N.b5s0g0.00000_100_0 using einsteinbinary_BRP4 version 999 in slot 4 29-Apr-2019 19:52:01 [Einstein@Home] Aborting task p2030.20170414.G44.61-02.33.N.b5s0g0.00000_100_0: exceeded disk limit: 127.11MB > 19.07MB 29-Apr-2019 19:52:01 [Einstein@Home] [sched_op] Deferring communication for 00:13:53 29-Apr-2019 19:52:01 [Einstein@Home] [sched_op] Reason: Unrecoverable error for task p2030.20170414.G44.61-02.33.N.b5s0g0.00000_100_0 29-Apr-2019 19:52:01 [Einstein@Home] Computation for task p2030.20170414.G44.77-01.19.C.b0s0g0.00000_1647_0 finished 29-Apr-2019 19:52:01 [Einstein@Home] Output file p2030.20170414.G44.77-01.19.C.b0s0g0.00000_1647_0_0 for task p2030.20170414.G44.77-01.19.C.b0s0g0.00000_1647_0 absent 29-Apr-2019 19:52:13 [Einstein@Home] Computation for task p2030.20170414.G44.61-02.33.N.b5s0g0.00000_100_0 finished 29-Apr-2019 19:52:13 [Einstein@Home] Output file p2030.20170414.G44.61-02.33.N.b5s0g0.00000_100_0_0 for task p2030.20170414.G44.61-02.33.N.b5s0g0.00000_100_0 absent 29-Apr-2019 19:52:14 [Einstein@Home] Finished download of p2030.20170414.G44.61-02.33.N.b5s0g0.00000_101.bin4 29-Apr-2019 19:52:14 [Einstein@Home] Finished download of p2030.20170414.G44.61-02.33.N.b5s0g0.00000_102.bin4 29-Apr-2019 19:52:14 [Einstein@Home] Starting task p2030.20170414.G44.61-02.33.N.b5s0g0.00000_101_0 29-Apr-2019 19:52:14 [Einstein@Home] [cpu_sched] Starting task p2030.20170414.G44.61-02.33.N.b5s0g0.00000_101_0 using einsteinbinary_BRP4 version 999 in slot 3 29-Apr-2019 19:52:15 [Einstein@Home] Aborting task p2030.20170414.G44.61-02.33.N.b5s0g0.00000_101_0: exceeded disk limit: 127.11MB > 19.07MB 29-Apr-2019 19:52:15 [Einstein@Home] [sched_op] Deferring communication for 00:19:53 29-Apr-2019 19:52:15 [Einstein@Home] [sched_op] Reason: Unrecoverable error for task p2030.20170414.G44.61-02.33.N.b5s0g0.00000_101_0 29-Apr-2019 19:52:16 [Einstein@Home] Computation for task p2030.20170414.G44.61-02.33.N.b5s0g0.00000_101_0 finished 29-Apr-2019 19:52:16 [Einstein@Home] Output file p2030.20170414.G44.61-02.33.N.b5s0g0.00000_101_0_0 for task p2030.20170414.G44.61-02.33.N.b5s0g0.00000_101_0 absent 29-Apr-2019 19:52:16 [Einstein@Home] Starting task p2030.20170414.G44.61-02.33.N.b5s0g0.00000_102_0 29-Apr-2019 19:52:16 [Einstein@Home] [cpu_sched] Starting task p2030.20170414.G44.61-02.33.N.b5s0g0.00000_102_0 using einsteinbinary_BRP4 version 999 in slot 3 29-Apr-2019 19:52:17 [Einstein@Home] Aborting task p2030.20170414.G44.61-02.33.N.b5s0g0.00000_102_0: exceeded disk limit: 127.11MB > 19.07MB 29-Apr-2019 19:52:17 [Einstein@Home] [sched_op] Deferring communication for 00:48:42 29-Apr-2019 19:52:17 [Einstein@Home] [sched_op] Reason: Unrecoverable error for task p2030.20170414.G44.61-02.33.N.b5s0g0.00000_102_0 29-Apr-2019 19:52:19 [Einstein@Home] Computation for task p2030.20170414.G44.61-02.33.N.b5s0g0.00000_102_0 finished 29-Apr-2019 19:52:19 [Einstein@Home] Output file p2030.20170414.G44.61-02.33.N.b5s0g0.00000_102_0_0 for task p2030.20170414.G44.61-02.33.N.b5s0g0.00000_102_0 absent ^C29-Apr-2019 19:52:44 [---] Received signal 2 29-Apr-2019 19:52:45 [---] Exiting keith@Nano:~/boinc$ |
Send message Joined: 5 Oct 06 Posts: 5149 ![]() |
You mean lines like 29-Apr-2019 19:51:46 [Einstein@Home] Aborting task p2030.20170414.G44.77-01.19.C.b0s0g0.00000_1646_1: exceeded disk limit: 127.11MB > 19.07MBI think that will be an individual limit on the task - you wouldn't have a limit below 20 meg on any global or system disk usage. Looking on my machine, I can see <workunit> <name>p2030.20170414.G44.61-02.33.N.b5s0g0.00000_1438</name> <app_name>einsteinbinary_BRP4</app_name> <version_num>134</version_num> <rsc_fpops_est>17500000000000.000000</rsc_fpops_est> <rsc_fpops_bound>350000000000000.000000</rsc_fpops_bound> <rsc_memory_bound>260000000.000000</rsc_memory_bound> <rsc_disk_bound>20000000.000000</rsc_disk_bound> ...That last line of 20,000,000 is the culprit - it translates to 19.07 binary MiB Why such big disk usage? My guess is that you've put a <copy_file/> on that big cuFFT library, so the whole darn thing is copied into the slot folder and counts towards disk usage. Try just allowing BOINC to create a softlink as usual (remove <copy_file/>, leave the rest of app_info unchanged) and run a test task. I think Linux should be able to follow softlinks, but I don't know for sure where libraries are concerned. If it doesn't work, you'll have to find a way of installing the FFT library in such a way that Linux finds it by whatever it uses as a 'PATH' equivalent. My Beta task from ten years ago actually succeeded where I'd expected it to fail, because I'd installed NVidia's developer SDK and the system had a way of finding the library in that package. Drivers don't install FFT support, but the developer tools do - but it has to be exactly the right version, and it changes with every CUDA version release. |
Send message Joined: 5 Oct 06 Posts: 5149 ![]() |
The Windows and Linux processes are different. BOINC started by using Linux conventions, so you may be in luck, but I remember having to introduce David Anderson to https://docs.microsoft.com/en-us/windows/desktop/dlls/dynamic-link-library-search-order#search-order-for-desktop-applications when the Linux techniques failed under Windows. You may have to reverse the process. |
![]() ![]() Send message Joined: 17 Nov 16 Posts: 906 ![]() |
I tried for the first two days to make soft links and exporting Paths and LDLibrary paths with no success. That is why all my first tasks had the missing libcufft and libcudart library errors. Finally decided to try just copying the system CUDA10 system libraries into the ones the application needed and referencing them directly in app_info. The system already comes with all the necessary libraries pre-installed. This is a developers kit system image made for developing apps so CUDA10, CUDnn, GCC, C+ and C++ are already there. I shouldn't have to install another version of CUDA. Since I didn't have any app_info version anymore that used CUDA file references, I just patterned the cuda references after the references for the dev files in the original app_info. That included file copies. Thanks for explaining that is the culprit since I didn't know or understand what that did. I have removed the file copies from app_info. Now just have to wait out the penalty box again. |
![]() ![]() Send message Joined: 17 Nov 16 Posts: 906 ![]() |
The Windows and Linux processes are different. BOINC started by using Linux conventions, so you may be in luck, but I remember having to introduce David Anderson to I think the problem comes from having to use the repo version of BOINC with its stranglehold on group and user ownership. I checked every time to make sure the executable had all its dependencies satisfied. And they were every location where the application was loaded. But when it came to actually running the client, it always failed to find the CUDA8 libraries. I should have just compiled a new BOINC for the aarch64 platform on my own and placed it in /home like I do with all my x86_64 hosts. That makes it so much easier to use BOINC and edit and move files. Finally had enough and stripped out the main BOINC files and moved them to a new boinc folder in /home and removed all the init files and dynamic links scattered all over the system referencing the old repo locations of things. Finally can run BOINC from /home like normal for me. I think after I finally am able to process a task correctly and get rid of my low daily allowance, I will try once more to make soft or symbolic links to the CUDA8 libraries. I think with BOINC in /home now, with /home being owned by $USER that the symbolic links will probably work as expected. |
Send message Joined: 5 Oct 06 Posts: 5149 ![]() |
Problems that require <copy_file/> to solve also arise when multiple different versions exist and have to be stored under different names in the filing system, but programmers are told to build their apps expecting a simple, generic, name. It's such a bloomin' nuisance that it has its own special name under Windows - 'DLL Hell'. NVidia were guilty of it in the earliest days - recycling the generic cudart.dll and cufft.dll over several (incompatible) generations. Under Windows, they've learned their lesson, and now use a strongly versioned name for each new release. It can be necessary to do a <copy_file/> with rename (<file_name> / <open_name>) to unambiguously resolve the confusion, but let's hope not. It's another reason to use Dependency Walker, to find out exactly which version of the filename has been embedded in the binary. Far more robust than using a Hex editor for the same purpose... |
![]() ![]() Send message Joined: 17 Nov 16 Posts: 906 ![]() |
Well since this is Linux and not Windows, Dependency Walker is of no use. To find the dependencies of any executable application in Linux all you need to do is: ldd <application name>in Terminal. keith@Nano:~/boinc/projects/einstein.phys.uwm.edu$ ldd einsteinbinary_cuda64 linux-vdso.so.1 (0x0000007f86d1f000) libcufft.so.8.0 => /usr/local/cuda/lib64/libcufft.so.8.0 (0x0000007f7ede5000) libcuda.so.1 => /usr/lib/aarch64-linux-gnu/tegra/libcuda.so.1 (0x0000007f7de9f000) libcudart.so.8.0 => /usr/local/cuda/lib64/libcudart.so.8.0 (0x0000007f7de2e000) libpthread.so.0 => /lib/aarch64-linux-gnu/libpthread.so.0 (0x0000007f7de02000) libc.so.6 => /lib/aarch64-linux-gnu/libc.so.6 (0x0000007f7dca9000) /lib/ld-linux-aarch64.so.1 (0x0000007f86cf4000) libstdc++.so.6 => /usr/lib/aarch64-linux-gnu/libstdc++.so.6 (0x0000007f7db14000) libm.so.6 => /lib/aarch64-linux-gnu/libm.so.6 (0x0000007f7da5a000) libdl.so.2 => /lib/aarch64-linux-gnu/libdl.so.2 (0x0000007f7da45000) librt.so.1 => /lib/aarch64-linux-gnu/librt.so.1 (0x0000007f7da2e000) libgcc_s.so.1 => /lib/aarch64-linux-gnu/libgcc_s.so.1 (0x0000007f7da0a000) libnvrm_gpu.so => /usr/lib/aarch64-linux-gnu/tegra/libnvrm_gpu.so (0x0000007f7d9c7000) libnvrm.so => /usr/lib/aarch64-linux-gnu/tegra/libnvrm.so (0x0000007f7d985000) libnvrm_graphics.so => /usr/lib/aarch64-linux-gnu/tegra/libnvrm_graphics.so (0x0000007f7d966000) libnvidia-fatbinaryloader.so.32.1.0 => /usr/lib/aarch64-linux-gnu/tegra/libnvidia-fatbinaryloader.so.32.1.0 (0x0000007f7d908000) libnvos.so => /usr/lib/aarch64-linux-gnu/tegra/libnvos.so (0x0000007f7d8ea000) This is the symbolic link for the cuda libraries the app needs. keith@Nano:/usr/local/cuda/lib64$ ls -l lrwxrwxrwx 1 root root 21 Apr 26 19:15 libcudart.so.8.0 -> libcudart.so.10.0.166 lrwxrwxrwx 1 root root 20 Apr 26 19:15 libcufft.so.8.0 -> libcufft.so.10.0.166 |
![]() ![]() Send message Joined: 17 Nov 16 Posts: 906 ![]() |
This is the app_info I am going to attempt to use. <app_info> <app> <name>einsteinbinary_BRP4</name> </app> <file_info> <name>einsteinbinary_cuda64</name> <executable/> </file_info> <file_info> <name>einsteinbinary_cuda-db.dev</name> </file_info> <file_info> <name>einsteinbinary_cuda-dbhs.dev</name> </file_info> <file_info> <name>libcufft.so.8.0</name> </file_info> <file_info> <name>libcudart.so.8.0</name> </file_info> <app_version> <app_name>einsteinbinary_BRP4</app_name> <version_num>999</version_num> <api_version>7.2.2</api_version> <coproc> <type>CUDA</type> <count>1.0</count> </coproc> <file_ref> <file_name>einsteinbinary_cuda64</file_name> <main_program/> </file_ref> <file_ref> <file_name>einsteinbinary_cuda-db.dev</file_name> <open_name>db.dev</open_name> <copy_file/> </file_ref> <file_ref> <file_name>einsteinbinary_cuda-dbhs.dev</file_name> <open_name>dbhs.dev</open_name> <copy_file/> </file_ref> <file_ref> <file_name>libcufft.so.8.0</file_name> </file_ref> <file_ref> <file_name>libcudart.so.8.0</file_name> </file_ref> </app_version> </app_info> |
Send message Joined: 5 Oct 06 Posts: 5149 ![]() |
Fingers crossed. It might be wise to mark those library files as <executable/> as well - they certainly contain binary code which is going to be executed. |
![]() ![]() Send message Joined: 17 Nov 16 Posts: 906 ![]() |
Fingers crossed. It might be wise to mark those library files as Don't think they work that way. At least they weren't marked executable in the CUDA90 special app app_info which I managed to find laying around in a forgotten disk. I looked at how the CUDA90 libcufft and libcudart libraries were referenced and used that app_info as pattern. They certainly worked well for that app. Agree, fingers crossed. Think it will work this time when I can get work again. |
![]() ![]() Send message Joined: 17 Nov 16 Posts: 906 ![]() |
Well my 24 hour delay expired and then BOINC set another 24 hour delay. Still unable to get any work for testing. |
Send message Joined: 5 Oct 06 Posts: 5149 ![]() |
You returned the 'disk limit exceeded' errors at 30 Apr 2019 3:58:40 UTC, and requested new work at 1 May 2019 0:44:29 UTC - that's less than 24 hours. I'm not sure exactly when Einstein resets it's 'daily' clock in relation to your time zone, but it might be worth a manual update when you wake up in the morning. (In this context, it's the Einstein server which is setting the delays) |
![]() ![]() Send message Joined: 17 Nov 16 Posts: 906 ![]() |
You returned the 'disk limit exceeded' errors at 30 Apr 2019 3:58:40 UTC, and requested new work at 1 May 2019 0:44:29 UTC - that's less than 24 hours. I had a feeling that Einstein bases its delay on calendar day and not UTC time. I still had 9 hours on the delay clock timer this morning but I went ahead and did an update and got work again. This time I am able to process tasks without errors. None have validated yet but expect they will. So finally configured the host and the app_info correctly. Thank you Richard for pointing out the real cause of the disk exceeded errors and the likely reason. The <file_copy> was the culprit. |
Copyright © 2025 University of California.
Permission is granted to copy, distribute and/or modify this document
under the terms of the GNU Free Documentation License,
Version 1.2 or any later version published by the Free Software Foundation.