Message boards : The Lounge : An undoubtedly very fascinating thread about GPU capabilities running multiple tasks
Message board moderation
Author | Message |
---|---|
Send message Joined: 18 Oct 14 Posts: 1489 |
Glory is crunching GWs 24/7 on 3 GPUs over at Einstein |
Send message Joined: 25 May 09 Posts: 1308 |
Please define "fancy cards" as doing so would help others see if their cards would be suitable to run Einstein's Gravitational Wave app on their GPUs. |
Send message Joined: 18 Oct 14 Posts: 1489 |
1 box has 2 3GB GTX1060s, not a fancy card, the other has a GTX1660s which is not very fancy. GWs want 3GB of memory and open CL. |
Send message Joined: 25 May 09 Posts: 1308 |
I have 3GB and OpenCL 1.2 on my 280X/7970 cards, they run, but never complete, the GPU usage is about 10%, the CPU has to do all the work. I just wander if the failure of your 280x to complete is in anyway related to the behaviour I am seeing on my 1070ti when running Einstein gravitational wave tasks. They run to 99% complete in about 10 minutes, with the last 1% taking about 5 minutes with very little activity on either CPU or GPU apart from a burst the last few seconds when both are nearly 100%. I really must visit the Einstein forum and see if anything similar has been posted over there. |
Send message Joined: 10 Mar 20 Posts: 69 |
Moved everything over from the other thread |
Send message Joined: 25 May 09 Posts: 1308 |
Note - I said "related", not "identical". That aside. I assume you are (were) running only one task per GPU, as indeed I am. There is a common item, the actual application, if this has simply been transliterated from a CPU-oriented one to something that is now being run on GPUs of various capabilities (such as our two) then there could (and indeed probably would) be a different set of symptoms displayed. Really frustrating for both of us - you seeing tasks failing after fair length of time, while mine just sit there in thumb-twiddle mode for minutes. Interesting observation, I've just swapped from running gravitational waves to Gamma-ray on the GPU and they run smoothly to completion in about 10 minutes with no pause and ~95% GPU utilisation. This is much more like the behaviour I would expect of a well formed application running on a GPU. |
Send message Joined: 25 May 09 Posts: 1308 |
A couple of observations. First, as you've said, at least one of your GPUs is quite old and low on resources. No problems there, but there have been quite a number of folks who have found that running multiple tasks on such GPUs actually reduces the overall performance (tasks run to completion per hour) when compared with running one task, and that the GPU usage is way below that expected. This effect does vary with GPU type and application so may be a red herring in this case. Second, OpenCL versions. This is an interesting one. When an application is developed for OpenCL the developer decides what version is to be used, let's say they decide on OpenCL version 1.0. Running alongside this, the GPU developers (an indeed just about every processor developer) decides what the minimum and maximum OpenCL versions their hardware will support, lets say 0.8 to 1.5. When the application is run one of the first things that is checked by the processor is the APPLICATION version of OpenCL, and the processor effectively switches into that mode and runs in that mode for the duration of that application. Now lets consider a slightly different scenario, the developer decides on OpenCL version 3, but we have the same processor in use; the version check fails and the application will not run, how graceful this failure is and what error messages seen are in part down to the developers. This may explain the problems encountered while trying to run the beta applications. |
Send message Joined: 17 Nov 16 Posts: 896 |
Both the Einstein GW and GRP apps pause around 90-98% at the end of the run to do work unit top list processing on the cpu. That is the cause of the pause on the tasks. All that is normal. The reason is the programmers want better FP64 precision than a gpu can provide for sorting the toplist so they transfer that last bit of computing from the gpu back to the cpu. To overcome the under utilization on the gpu for that last 10% of the WU crunching, you can run doubles or triples on each gpu if they have enough memory. By staggering the start/endings of the work units, you can keep the gpu utilization at 98%-100%. |
Send message Joined: 18 Oct 14 Posts: 1489 |
+1 well said |
Send message Joined: 25 May 09 Posts: 1308 |
Thanks for the explanation Keith. |
Send message Joined: 24 Dec 19 Posts: 229 |
the app you're currently running (v1.22) actually has a substantial code flaw making this app not well optimized for Nvidia cards (way slower than it should be for the level of compute performance in modern Nvidia GPUs). many people tried to claim "AMD is just better at Einstein" which turned out to be wrong in actuality. The way it was coded essentially held back Nvidia GPUs and forced some of the calculations to run serially instead of in parallel like it should be. even though this app reports "95%" core use, I'm sure you'll notice that the power used is less than other apps which truly load up the GPU. The old code is like a handbrake for Nvidia GPUs. However, if you update your current drivers from the 461 version you're currently running (which only support OpenCL 1.2), to more recent drivers 470+ (which support OpenCL 3.0), you will get the newer v1.28 gamma ray application from Einstein and your tasks will run about 40-50% faster. Petri and I worked together to improve the code in the Einstein app (mostly him, I did a bunch of testing and had very small code contributions), and I communicated the necessary changes to the Einstein project developers who incorporated it into this new application. the new app is available for everyone with an Nvidia GPU as long as you have the required drivers. The method in use to restore parallelized code for Nvidia is a method only available in OpenCL 2.0 and up. The timing of Nvidia releasing OpenCL 3.0 capable drivers aligning with Petri's interest in fixing the app couldn't have been better :). update the drivers, the project will send you the new app, and you process work faster. as simple as that. (only new app for gamma ray at this time) for the gravitational wave tasks, the app is fairly CPU bound since the devs couldn't put certain calculations onto the GPU. so the GPU is constantly waiting around for work from the CPU, which is why GPU utilization can be rather low with this app. using a faster CPU can help, but not enough to reach full utilization. most people just run 2-3 tasks concurrently to try to load up the GPU more. it's also normal (for this app) to pause at 99% while it does some final calculations. I can't remember if it's using the CPU or GPU double precision (fp64) to finish this bit, but both would explain the GPU low utilization at this time. |
Send message Joined: 24 Dec 19 Posts: 229 |
and even though it seems this thread was spun off from another thread. it seems better suited to the GPU forum instead of here. |
Copyright © 2025 University of California.
Permission is granted to copy, distribute and/or modify this document
under the terms of the GNU Free Documentation License,
Version 1.2 or any later version published by the Free Software Foundation.