Posts by Ian&Steve C.

21) (Message 8788)
Posted 12 Feb 2025 by Ian&Steve C.
Post:
The onboard Intel HD graphics cards are a bit too slow. You can use it for tasks on Numberfields but it will take around 9 hours. Where a Nvidia 1080 graphics card from ten years ago completes the tasks within 20-40 minutes

I'm unsure on the speeds of the Iris XE graphics cards


The issue with Intel integrated GPUs isn’t the speed really. It’s more that this project relies on FP64 calculations and most Intel iGPUs do not support this. There’s only one generation (Broadwell I think) that the iGPUs had FP64 support.

Intel Arc Alchemist cards have the same limitation and cannot do FP64 in hardware.

Intel Arc Battlemage cards do have FP64 however. These could be used if there was an application for it.
22) (Message 8767)
Posted 4 Feb 2025 by Ian&Steve C.
Post:

einstein, gpugrid is mainly FP32 with some FP64 (usually favors AMD)


Einstein is sensitive to the memory bandwidth. lots of data flying back and forth between VRAM and GPU. both the O3AS and BRP7 app usually see improvements proportional to improvements in the memory bandwidth.

a good example is the Titan V vs the V100. the exact same core and architecture, and very similar FP32/FP64/etc metris, only varying slightly due to slight clock speed differences.

with the Titan V (3x active HBM stacks)
3072-bit bus, 651 GB/s bandwidth

V100 (4x active HBM stacks)
4096-bit bus, 897 GB/s bandwidth

V100 sees an increase in bandwidth by 37%, and also sees a performance improvement over Titan V by about 36% under the same conditions on the O3AS application.

Einstein I don't think favors AMD. that used to be the case, but the old OpenCL apps were unoptimized and had flaws in the OpenCL code which artificially limited Nvidia performance (issues with serialization of compute, fixed by Petri), and since the advent of CUDA code at Einstein, even the stock apps, they pretty well favor Nvidia now.

GPUGRID, I'm not sure you can say it favors AMD when their project only has CUDA apps. you have to use ZLUDA to even contribute lol
23) (Message 8766)
Posted 4 Feb 2025 by Ian&Steve C.
Post:
also you should check ACTUAL power use, not the top TDP spec, because it does not use all of the TDP when running asteroids since the performance is so limited by the memory access.
24) (Message 8743)
Posted 5 Jan 2025 by Ian&Steve C.
Post:
I think something is wrong or misconfigured. Even though an RX 550 is slow, it’s not THAT slow that it should take 10hrs to complete a unit.

Even my old Nvidia GTX 550Ti completes them faster (about 2hrs). And it’s a much slower and older GPU than the AMD RX 550.
25) (Message 8712)
Posted 28 Dec 2024 by Ian&Steve C.
Post:
There are more than 2.35 million tasks Asteroids tasks ready to send. But I haven't received any for days.

My event log says "Not requesting tasks don't need (CPU; AMD/API;)"

Why is that?


S. Gaber


With BOINC, you only get what you ask for. you didnt ask for any work, so you get sent no work.

the reason BOINC would say "not requesting tasks" could be a few things, but most likely you have more than enough work from another project to satisfy your current work cache and resource share settings.
26) (Message 8702)
Posted 23 Dec 2024 by Ian&Steve C.
Post:
The project is not utilizing any of my Apple Silicon Mac 10 GPU cores, only the 10 CPU cores, even though this project has GPU support on Windows, why does this not work on MacOS?


the calculations performed here are mostly FP64.
the Apple Silicon GPU does not support FP64 at all.

therefore, supporting this GPU is impossible. it's not macos that's the problem. it's your hardware that's the problem.
27) (Message 8640)
Posted 2 Dec 2024 by Ian&Steve C.
Post:
Hello.
There have been no credit updates from Asteroids@home in BOINCstats for 17 days. I don't know if it's an Asteroids or BOINCstats problem.


it's a BOINCstats problem.
28) (Message 8639)
Posted 2 Dec 2024 by Ian&Steve C.
Post:
On my GPU card Nvidia RTX 4070 Super when I am running primegrid tasks GPU and Power usage is nearly on 100%.
But with Asteroid@home app GPU usage is 100% but power usage about 50%.
Does anybody know why?

nvidia-smi
Thu Nov 28 14:35:58 2024
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 560.35.03 Driver Version: 560.35.03 CUDA Version: 12.6 |
|-----------------------------------------+------------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+========================+======================|
| 0 NVIDIA GeForce RTX 4070 ... On | 00000000:08:00.0 Off | N/A |
| 34% 61C P0 118W / 242W | 5890MiB / 12282MiB | 100% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+

+-----------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=========================================================================================|
| 0 N/A N/A 3068 G /usr/lib/xorg/Xorg 254MiB |
| 0 N/A N/A 4474 G compton 3MiB |
| 0 N/A N/A 357062 G /usr/lib/firefox/firefox 164MiB |
| 0 N/A N/A 373424 G ...erProcess --variations-seed-version 60MiB |
| 0 N/A N/A 383964 C ..._x86_64-pc-linux-gnu__cuda118_linux 2782MiB |
| 0 N/A N/A 384505 C ..._x86_64-pc-linux-gnu__cuda118_linux 2566MiB |
+-----------------------------------------------------------------------------------------+

note: running 1 single, 2 or 4 tasks at same time does not change anything


the Asteroids GPU application is heavily constrained by random memory access. this slows down overall computation A LOT. your GPU is waiting around for memory transfers more than it's actually computing, that's why the power used is so low, because it's not working that hard. it's not a problem, its just the nature of the application.

Primegrid is just crunching numbers, and doesnt have the same reliance on VRAM transfers, primegrid tasks mostly fit within the L2 cache on the 40-series cards from what i remember, and they are more optimized than the apps here. you basically can't compare two apps that are doing two totally different things.
29) (Message 8575)
Posted 25 Sep 2024 by Ian&Steve C.
Post:
Please do like the BOINC projects "Amicable numbers" and "RakeSearch", and add a "percentage completed" line to the server status page, either for the whole project, or for each application, as well as a completion estimate (year and month), based on the current speed.


probably not possible. there's new data coming in all the time. this project does not have a set defined "end" and new data comes in from many different sources, it will likely keep running as long as there are resources/budget allotted to administration and development.
30) (Message 8569)
Posted 11 Sep 2024 by Ian&Steve C.
Post:
it's a common issue/bug. just have to keep trying and you will eventually get work.
31) (Message 8565)
Posted 6 Sep 2024 by Ian&Steve C.
Post:
The AVX512 unit in ZEN5 is supposed to be significantly more powerful than in ZEN4. In my test 7950X vs. 9950X, both 105W ECO, the 9950X was only as fast the 7950X, often even 1-2 minutes slower. Same with Windows 11 (with new AMD patch) as with Linux Mint 22, . Does the app for ZEN5 require special optimization or do the improvements simply not work for BOINC projects?

NFS@Home also has an AVX512 app. It behaves the same there.


just having the app compiled to use the avx512 pipeline wont automatically mean faster processing. the data needs to be packaged appropriately to truly increase throughput. the axv512 app here is only a little bit faster than the avx/fma app, which tells you that the data isnt structured in a way that takes much advantage of it.

imagine you have a conveyor of buckets moving water from one place to another place. the conveyor moves at a constant speed
at first your buckets are too small, and some water is overflowing from the buckets. restricting how much water is moved at a time
then you increase the bucket size and now the buckets are not overflowing and you can move the max rate of water since you are limited to how much you fill each bucket.
then you double the size of the bucket, this does nothing for overall throughput since you are still only putting the same amount in each bucket.

that's essentially whats happening here.
32) (Message 8558)
Posted 2 Sep 2024 by Ian&Steve C.
Post:
two things:

the rpi5 can use the NEON SIMD optimizations, which work the CPU harder. your rpi3 tasks are not using this. I'm not sure if that's a limitation of the 32-bit app or the specific implementation of NEON SIMD with the asteroids app with armv7.

the other factor is that the rpi5 runs at like twice the clock speed of a rpi3. which generates more heat.

the asteroids app does generate more heat than other projects from my experience, this is normal due to the SIMD optimizations. my rpi5 does see fairly high temps at about 75C running asteroids. you said you have a heatsink on it, but do you have a fan on it too? you'll probably need some airflow over it.
33) (Message 8544)
Posted 21 Aug 2024 by Ian&Steve C.
Post:

This project relies heavily on FP64 calculations. Intel GPU is not viable at this time. That could change if future generations of GPUs have hardware FP64 support, but for right now it’s basically a non-starter.



Intel didnt support FP64 ?!?!

That's not entirely true, the Intel ARC series can only calculate FP64 in emulated form but they can do it.

I don't know how the value on Wikipedia came about, but if it is correct, 4-4.9 TerraFlops would be possible with FP64 emulation.


no that's not correct at all. any info you find about FP64 TFlops on ARC GPUs was pre-launch speculation that never got corrected. you will never get 4-5 TFlop from emulated FP64. that's faster than a Radeon VII which has strong hardware FP64 performance.

the ARC Alchemist cards do not support FP64 in hardware at all. the emulation only works for Linux in terms of GPGPU compute (and it's very slow), and where it "works" in Windows is to handle the very very few instances in some games that need it (but the slow performance doesn't hinder the overall processing of the game).

sorry, it's not going to happen until Intel includes hardware FP64. maybe they will in the next generation of ARC with Battlemage
34) (Message 8534)
Posted 18 Aug 2024 by Ian&Steve C.
Post:
see https://asteroidsathome.net/boinc/forum_thread.php?id=228

It's possible, but no BOINC projects currently offer apps directly supporting this architecture. The code has to be recompiled for this platform, which shouldn't take much effort, but I don't have any Xeon Phi HW to test it on. If any volunteer is willing to add support, let me know.


I know someone who actually already ported it for Xeon Phi. let me point them to this thread. but I don't know how to get BOINC to detect it and use it properly. but they were able to run the app offline at least.

spoiler: it's pretty slow. i think a more modern CPU will still be a few times more productive.
35) (Message 8525)
Posted 8 Aug 2024 by Ian&Steve C.
Post:
it probably is because of the inherent differences in the GPUs.

you can see that the app custom tailors the grid size to the GPU architecture. and i don't think it can restart from a checkpoint where the grid size changes. the way all the previous work was complete becomes not compatible with the change.

the problem will only happen when the task restarts on a different GPU, if it restarts on the same GPU, or never has to pause/restart, then you wont see the problem.

you can increase the length of time for task switching to try to prevent this. also not stopping BOINC in the middle of task execution if possible.
36) (Message 8520)
Posted 7 Aug 2024 by Ian&Steve C.
Post:
Any additions are welcome.
Are you talking about VRAM or RAM speed?


for the GPU app I'm talking about the VRAM.

but the same concept applies to the CPU app too, with RAM speed/bandwidth having a big impact on the performance.
37) (Message 8518)
Posted 7 Aug 2024 by Ian&Steve C.
Post:
to add on, even though the app makes heavy use of FP64 for GPU, it is not the determining factor for performance. IE, a GPU with twice the FP64 Flops will not necessarily be twice as fast in processing Asteroids tasks. 4090s are still faster than Titan Vs, despite the Tian V being many times faster in FP64 compute.

memory speed/bandwidth seems to be the biggest limiting factor.
38) (Message 8509)
Posted 3 Aug 2024 by Ian&Steve C.
Post:
NO tasks for over 24 hours
Application Unsent In progress
Period Search Application 0 149313


read here: https://asteroidsathome.net/boinc/forum_thread.php?id=911&postid=8078
39) (Message 8507)
Posted 2 Aug 2024 by Ian&Steve C.
Post:
as with your post on Einstein, this project does not use neural networks, so an NPU has no use here either.

any project that could potentially use an NPU would have to be doing neural network type work. the only project that I know doing this kind of work is GPUGRID on their ATMML app.
40) (Message 8501)
Posted 30 Jul 2024 by Ian&Steve C.
Post:
OK, so we know the work generation is out of the hands of the admin.
My point is still, why isn't work generation automated in the hands of the scientists who ARE responsible for work generation.


it IS in the hands of the Project Scientist. they made the deliberate decision to not automate it to have more control over the process and which data gets priority in generation.

a few posts up: https://asteroidsathome.net/boinc/forum_thread.php?id=911&postid=8078


Previous 20 · Next 20