Posts by Ian&Steve C.

21) (Message 8011)
Posted 18 Sep 2023 by Ian&Steve C.
Post:
Radeon HD 6900 is a 13-year-old trash.
https://www.techpowerup.com/gpu-specs/radeon-hd-6950.c405

on the other hand, AMD Radeon RX 6900 is pretty recent
https://www.techpowerup.com/gpu-specs/radeon-rx-6900-xt.c3481

similar name, but very different cards :) in theory, the old card could run these tasks (it has only 2GB of memory), but don't expect ANY performance. I would estimate 6 hours / WU @ 150W ...

support for old hardware is welcome, but this doesn't really make sense. These days, the most low-end or even integrated GPUs are much more powerful & efficient than this card.


agreed. I'm not sure how much VRAM is being used by the AMD app here, but very possible 2GB isnt enough. and yes it will be VERY slow.
22) (Message 7999)
Posted 17 Sep 2023 by Ian&Steve C.
Post:
what's the power draw for those 6 mins?
23) (Message 7995)
Posted 17 Sep 2023 by Ian&Steve C.
Post:
what does the event log say? is it asking for GPU work? in general projects only give you what the client asks for. so if the client isnt asking, it wont get.

my hunch is that it's something up with the account manager. remove it and see if you get work. if you do then you know BAM is the issue.
24) (Message 7974)
Posted 16 Sep 2023 by Ian&Steve C.
Post:
since the app is more serialized, latest gen cards with high clock speeds seem to excel. that being the AMD 7000 series and nvidia 4000 series with clocks exceeding 2.5GHz. with more of an edge to the nvidia/CUDA cards still.

CPU remains overall more productive though (especially the high core count models), maybe this can change in the future. but nice to have more available devices to crunch if they wish to.
25) (Message 7953)
Posted 15 Sep 2023 by Ian&Steve C.
Post:
Latest Version craches with integrated Vega11

<core_client_version>7.18.1</core_client_version>
<![CDATA[
<message>
process exited with code 1 (0x1, -255)</message>
<stderr_txt>
BOINC client version 7.18.1
BOINC GPU type 'ATI', deviceId=0, slot=13
Application: ../../projects/asteroidsathome.net_boinc/period_search_10217_x86_64-pc-linux-gnu__opencl_101_amd_linux
Version: 102.17.0.0
Platform name: Clover
Platform vendor: Mesa
OpenCL device C version: OpenCL C 1.1 | OpenCL 1.1 Mesa 23.1.7 - kisak-mesa PPA
OpenCL device Id: 0
OpenCL device name: AMD Radeon Vega 11 Graphics (raven, LLVM 15.0.7, DRM 3.49, 6.2.0-32-generic) 5GB
Device driver version: 23.1.7 - kisak-mesa PPA
Multiprocessors: 11
Max Samplers: 32
Max work item dimensions: 3
Resident blocks per multiprocessor: 32
Grid dim: 704 = 2 * 11 * 32
Block dim: 128
Binary build log for AMD Radeon Vega 11 Graphics (raven, LLVM 15.0.7, DRM 3.49, 6.2.0-32-generic):
OK (0)
Program build log for AMD Radeon Vega 11 Graphics (raven, LLVM 15.0.7, DRM 3.49, 6.2.0-32-generic):
OK (0)
Prefered kernel work group size multiple: 64
Setting Grid Dim to 256
amdgpu: The CS has been rejected (-125), but the context isn't robust.
amdgpu: The process will be terminated.


i think your opencl version is too old, 1.1 is ancient. Mesa is generally not well suited for real compute loads. Many people have issues using Mesa with BOINC

try installing the real AMD drivers. you might have to back up to an older OS/kernel though. or try the ROCm drivers, but i think ROCm 5.6? was the last to support Vega
26) (Message 7935)
Posted 13 Sep 2023 by Ian&Steve C.
Post:
Tasks won´t work on my machine.
https://asteroidsathome.net/boinc/result.php?resultid=400860889

GLIBC problem.


Looks like Georgi will have to recompile this again with an older environment. Strange that this was put up yesterday. Maybe he accidentally put up the same app again.
27) (Message 7933)
Posted 13 Sep 2023 by Ian&Steve C.
Post:
The earlier ones gave a huge number of points, making them equivalent to a CPU per time. I guess that was a bonus for being a tester


I found only one instance of the high credit reward in your tasks.
https://asteroidsathome.net/boinc/workunit.php?wuid=174901689

It’s an artifact of CreditNew, not due to a being a “tester”. On that one task, you were matched up with another amd GPU task. Since both of your devices reported a really high Flops value (relative to the CPUs) so the credit reward got scaled up a lot.

Though I’m not quite sure why the same thing doesn’t happen with the CUDA app wingmen who generallly have much higher reported flops values (but should be taking your flops value as baseline). Maybe that only happens when the same exact app is used.

This kind of idiosyncrasy is why CreditNew is not ideal. And why a static reward is better IMO. The “value” of a task shouldn’t change depending on what device run it. A CPU task is the same as a GPU task. And two hosts shouldn’t receive wildly more credit just because they ran on GPU and happened to match with each other.
28) (Message 7918)
Posted 10 Sep 2023 by Ian&Steve C.
Post:
sounds great. glad to help :)
29) (Message 7914)
Posted 10 Sep 2023 by Ian&Steve C.
Post:
otherwise people on linux will only be able to run the apps with the very latest OS, which not everyone wants to do.
For what reason would you not keep your OS up to date?


it's not about being up to date. as it stands, the Linux app requires many to run an OS build that's essentially beta and bleeding edge. this can be less stable than something like an LTS release with slightly older and more vetted software packages.
30) (Message 7912)
Posted 10 Sep 2023 by Ian&Steve C.
Post:
Georgi,

what environment did you build the Linux apps from? GLIBC is a base package that's not easy to upgrade and well integrated to the OS. doing apt update and apt upgrade will not change this package. only a full OS upgrade will change it. having a 2.38 dependency is very bleeding edge, and that's not even available in Ubuntu yet. the latest release (23.04) is still on 2.37, and 2.38 is still in active development for their upcoming 23.10 release.

since 22.04 is still the primary LTS version, i think it would make sense if you use an older build environment (recommend something older like ubuntu 20.04 era with older GLIBC) to avoid these compatibility issues. otherwise people on linux will only be able to run the apps with the very latest OS, which not everyone wants to do.
31) (Message 7859)
Posted 18 Aug 2023 by Ian&Steve C.
Post:
Out of work again
32) (Message 7805)
Posted 20 Apr 2023 by Ian&Steve C.
Post:
Why did you copy previous post almost word to word?
He messed up the quoting. The last sentence was his.


people don't understand satire/sarcasm. i didn't mess up any quoting and was never trying to quote. I merely replied in the same manner for greater impact.
33) (Message 7801)
Posted 20 Apr 2023 by Ian&Steve C.
Post:
Why do people keep stating the obvious? We know the work is not continuous, it comes in regular batches. There is nothing wrong. This is just a friendly heads up to the admin if they happen to see it so they can load up more data, which is a good thing.
34) (Message 7799)
Posted 20 Apr 2023 by Ian&Steve C.
Post:
out of work again
35) (Message 7744)
Posted 18 Feb 2023 by Ian&Steve C.
Post:
https://asteroidsathome.net/boinc/workunit.php?wuid=155519775

3090 vs 3 x 2080 Ti

Or what?

I'm still developing.


:)
36) (Message 7717)
Posted 19 Jan 2023 by Ian&Steve C.
Post:
The error code indicates missing DLLs.

Maybe a driver issue. Try running DDU and reinstall the drivers clean.
37) (Message 7699)
Posted 13 Jan 2023 by Ian&Steve C.
Post:
On my Hp Xeon Z620 Gtx Titan + Gtx 1070 even reserving one project per card (Asteroids + Milkyway) Asteroids continues to make errors, regardless of the gpu card used.

<exclude_gpu>
<url>https://asteroidsathome.net/boinc/url>
<device_num>1</device_num>
</exclude_gpu>
<exclude_gpu>
<url>http://milkyway.cs.rpi.edu/milkyway/</url>
<device_num>0</device_num>
</exclude_gpu>


i think you've excluded the wrong GPU or made some mistake in the cc_config file to where it did not take effect. all of your errors are trying to use the GTX Titan Kepler card, and its failing for the same reason I mentioned, unsupported CC version with the CUDA 11.8 application.

you did process at least one task without issue on your GTX 1070: https://asteroidsathome.net/boinc/result.php?resultid=353155970

In your case, I would recommend reverting to the older 440 branch of drivers. this driver should support both your GTX Titan and GTX 1070. this will prevent the project from sending you the CUDA 11.8 app and you instead should receive the CUDA 10.2 app which will work on both of your GPUs.

try this driver: https://www.nvidia.com/Download/driverResults.aspx/155056/en-us/
not sure if there is any major difference between win10 and win11 drivers as I don't think drivers as old as this were ever available for Win11, but it's worth a shot.

if it doesnt work, then you might need to break the Titan out into it's own system, or reconfigure (and lock) your coproc_info.xml file to reflect the capabilities of your titan (CC 3.5) so that the project can see that and send you the CUDA 10.2 app. right now all it sees is your 1070 and it's sending you a compatible app for that not knowing that your second GPU is not compatible.
38) (Message 7693)
Posted 12 Jan 2023 by Ian&Steve C.
Post:
Zark:
I have a GTX Titan, a GTX 1070, a GT 1030, and a Quadro K5000, and Milkyway for GPU has no problems with these cards.
...

Are you saying that one host/computer has all 4 GPUs installed and running fine?
Because the post title says "... host with multiple NVIDIA devices ..."


the problem isnt simply having multiple GPUs. that's no problem.

the problem is when you have multiple nvidia GPUs and they need different apps.

say you have a Kepler card (CC 3.5) and a Ampere card (CC 8.6) card in the same host.

the ampere needs at least a CUDA 11.1 app. so it will use the 11.8 CUDA app available here. but that app doesnt support the Kepler card. it will error if run on that card. and conversely the Ampere card can't use the CUDA 5.5 or 10.2 apps that the Kepler can use. this is due to the limits placed on the applications when they were compiled. the 11.8 app was compiled with support for CC 5.0-8.9 only.

these kinds of restrictions are only because of how CUDA support is segmented in the toolkits and drivers. Milkyway works fine because it's a legacy OpenCL application that supports most devices, though maybe not as optimized or as fast as it could be if it were coded in CUDA or even later versions of OpenCL.

OpenCL does not know or care about anything related to CC and cannot have requirements for it. CC is an Nvidia/CUDA-only thing.
39) (Message 7692)
Posted 12 Jan 2023 by Ian&Steve C.
Post:
Hi!

Some wizardry is being done here too. I'm running three at a time and I'm working with the source administrators. It will take time (a lot) to get fully acquainted and get a deep knowledge and understanding what is going on in, at , on , with, etc ... where the computing is being done (GPU).

I hope We will face a good new year here too!

You can probably find my results and host following my username - account - host - ...

Petri


wizardry is your forte :)

so far, heavy reliance on double. FP64 cards perform quite well.

can you get the same precision with float + more wizardry?
40) (Message 7524)
Posted 29 Nov 2022 by Ian&Steve C.
Post:
It's not about being competitive. I just understand how BOINC operates on a deeper level than most people do so I'm trying to give a big picture view of the issue as a whole to highlight how the root cause of this problem lies with the client and not the project server.

nothing I posted is incorrect. his symptoms are entirely due to limitations in the BOINC client. These kinds of issues happen at every BOINC project, not just here. the client is only setup to transmit the "best" GPU. this is fact. that means the server scheduler MUST be setup to act on this information only. it cannot differentiate between two different nvidia GPUs that require different apps because it only knows about the "best" one. it can only act on different GPUs if they are from different vendors like AMD or Intel.

I don't see how there's much you can do from the server side since the client isn't giving you the information you require. Unless you plan to diverge your server code from the standard BOINC model in order to implement custom stuff like a feedback loop using stderr output data into your scheduler. if so, more power to you. but your comments on other threads seemed to indicate that you planned to stick with the standard BOINC model. really the only solution the project could do here is implement an OpenCL application for nvidia to give broader device compatibility.

the things I've proposed really are the best solutions given the BOINC limitations. It's totally fine if Colin wants to not move GPUs around, it was just a suggestion. everyone is free to operate however they like if they are satisfied with a workaround.


Previous 20 · Next 20