Troubleshooting host with multiple NVIDIA devices but with different Compute Capabilities (CC)
Message boards :
Problems and bug reports :
Troubleshooting host with multiple NVIDIA devices but with different Compute Capabilities (CC)
Message board moderation
Author | Message |
---|---|
Send message Joined: 21 Jun 12 Posts: 24 Credit: 19,989,257 RAC: 134 |
|
Send message Joined: 21 Jun 12 Posts: 24 Credit: 19,989,257 RAC: 134 |
https://asteroidsathome.net/boinc/results.php?userid=1690&offset=0&show_names=0&state=6&appid= Error during calculations 0.00 0.00 |
Send message Joined: 22 Nov 17 Posts: 159 Credit: 13,180,518 RAC: 0 |
There is almost every week a new question about "Why I'm getting errors while computing" related to one or another NVIDIA device. I'll put this here as I believe this information will enlighten the situation and the cause of the problem along with the workaround. Why I'm getting invalid results and errors while computing on one of my cards? When you have more than one NVIDIA card installed in the same host but with different Compute Capabilities (CC) especially when they are far from compatibility your cards with lower CC will keep receiving the same application as for the card with the highest CC. And there is nothing that we can do from server side. It is a BOINC issue and how the BOINC-client works and reports to the server your configuration. As Ian&Steve C. stated in his post here: symptoms are entirely due to limitations in the BOINC client. These kinds of issues happen at every BOINC project, not just here. the client is only setup to transmit the "best" GPU. this is fact. that means the server scheduler MUST be setup to act on this information only. it cannot differentiate between two different nvidia GPUs that require different apps because it only knows about the "best" one. it can only act on different GPUs if they are from different vendors like AMD or Intel. How to deal with the problem? The only way for the moment is to restrict the use of a troubling cards at a specific level using the Client configuration files. Options are described at details here Client configuration It is a workaround instead of a solution but still we all have to deal with the capabilities of BOINC software. Of course, there is always one more option, to get out the troubling card(s) from the particular host and install it (them) to a separate computer (new host) which is not always an option as those host has their primary designation and are used exactly in the configuration they have while running BOINC projects is only a spare time task. “The good thing about science is that it's true whether or not you believe in it.” ― Neil deGrasse Tyson |
Send message Joined: 21 Jun 12 Posts: 24 Credit: 19,989,257 RAC: 134 |
Last modified: 1 Jan 2023, 23:13:23 UTC I have a GTX Titan, a GTX 1070, a GT 1030, and a Quadro K5000, and Milkyway for GPU has no problems with these cards. https://milkyway.cs.rpi.edu/milkyway/results.php?userid=5706&offset=100&show_names=0&state=0&appid= https://milkyway.cs.rpi.edu/milkyway/hosts_user.php And the old version of Asteroids also worked fine. https://asteroidsathome.net/boinc/results.php?userid=1690&offset=0&show_names=0&state=4&appid= |
Send message Joined: 12 Apr 17 Posts: 31 Credit: 5,360,264 RAC: 0 |
Zark: I have a GTX Titan, a GTX 1070, a GT 1030, and a Quadro K5000, and Milkyway for GPU has no problems with these cards. Are you saying that one host/computer has all 4 GPUs installed and running fine? Because the post title says "... host with multiple NVIDIA devices ..." For the Quadro K5000 the stderr says: UDA Device: Quadro K5000 4096MB CUDA Device driver: 462.96 Compute capability: 3.0 Shared memory per Block | per SM: 49152 | 49152 Multiprocessors: 8 Unsupported Compute Capability (CC) detected (3.0). Supported Compute Capabilities are between 5.3 and 8.9. So maybe MW has different CC requirements? Check again the last post from Georgi ... |
Send message Joined: 23 Apr 21 Posts: 85 Credit: 111,886,184 RAC: 202,512 |
Last modified: 12 Jan 2023, 23:27:06 UTC Zark: the problem isnt simply having multiple GPUs. that's no problem. the problem is when you have multiple nvidia GPUs and they need different apps. say you have a Kepler card (CC 3.5) and a Ampere card (CC 8.6) card in the same host. the ampere needs at least a CUDA 11.1 app. so it will use the 11.8 CUDA app available here. but that app doesnt support the Kepler card. it will error if run on that card. and conversely the Ampere card can't use the CUDA 5.5 or 10.2 apps that the Kepler can use. this is due to the limits placed on the applications when they were compiled. the 11.8 app was compiled with support for CC 5.0-8.9 only. these kinds of restrictions are only because of how CUDA support is segmented in the toolkits and drivers. Milkyway works fine because it's a legacy OpenCL application that supports most devices, though maybe not as optimized or as fast as it could be if it were coded in CUDA or even later versions of OpenCL. OpenCL does not know or care about anything related to CC and cannot have requirements for it. CC is an Nvidia/CUDA-only thing. |
Send message Joined: 21 Jun 12 Posts: 24 Credit: 19,989,257 RAC: 134 |
On my Hp Xeon Z620 Gtx Titan + Gtx 1070 even reserving one project per card (Asteroids + Milkyway) Asteroids continues to make errors, regardless of the gpu card used. <exclude_gpu> <url>https://asteroidsathome.net/boinc/url> <device_num>1</device_num> </exclude_gpu> <exclude_gpu> <url>http://milkyway.cs.rpi.edu/milkyway/</url> <device_num>0</device_num> </exclude_gpu> |
Send message Joined: 23 Apr 21 Posts: 85 Credit: 111,886,184 RAC: 202,512 |
On my Hp Xeon Z620 Gtx Titan + Gtx 1070 even reserving one project per card (Asteroids + Milkyway) Asteroids continues to make errors, regardless of the gpu card used. i think you've excluded the wrong GPU or made some mistake in the cc_config file to where it did not take effect. all of your errors are trying to use the GTX Titan Kepler card, and its failing for the same reason I mentioned, unsupported CC version with the CUDA 11.8 application. you did process at least one task without issue on your GTX 1070: https://asteroidsathome.net/boinc/result.php?resultid=353155970 In your case, I would recommend reverting to the older 440 branch of drivers. this driver should support both your GTX Titan and GTX 1070. this will prevent the project from sending you the CUDA 11.8 app and you instead should receive the CUDA 10.2 app which will work on both of your GPUs. try this driver: https://www.nvidia.com/Download/driverResults.aspx/155056/en-us/ not sure if there is any major difference between win10 and win11 drivers as I don't think drivers as old as this were ever available for Win11, but it's worth a shot. if it doesnt work, then you might need to break the Titan out into it's own system, or reconfigure (and lock) your coproc_info.xml file to reflect the capabilities of your titan (CC 3.5) so that the project can see that and send you the CUDA 10.2 app. right now all it sees is your 1070 and it's sending you a compatible app for that not knowing that your second GPU is not compatible. |
Send message Joined: 21 Jun 12 Posts: 24 Credit: 19,989,257 RAC: 134 |
Last modified: 14 Jan 2023, 0:30:03 UTC |
Send message Joined: 21 Jun 12 Posts: 24 Credit: 19,989,257 RAC: 134 |
|
Send message Joined: 21 Jun 12 Posts: 24 Credit: 19,989,257 RAC: 134 |
I have the following message in the notifications tab of boinc while my gpus are calculating very well, why? "Asteroids@home: Notice from server NVIDIA GPU: Please update your system with the latest drivers to be able to compute with the GPU 01/15/2023 10:57:02" |
Message boards :
Problems and bug reports :
Troubleshooting host with multiple NVIDIA devices but with different Compute Capabilities (CC)