Posts by ahorek's team

1) (Message 9032)
Posted 2 days ago by ahorek's team
Post:
> You need to use the unpublished application that is available from the project codebase and build it yourself and deploy it via the anonymous platform on your host.

I can't recommend that because your wingman also needs the updated app. Otherwise, the task will fail and eventually error out and be marked invalid. Those WUs with LC Points >2000 should be cancelled by the server and reprocessed later, once the new app is officially released.
2) (Message 8997)
Posted 12 days ago by ahorek's team
Post:
> Has the AVX512 application been definitively discontinued?

No, it has not. https://asteroidsathome.net/boinc/forum_thread.php?id=1123#8960
3) (Message 8980)
Posted 16 May 2025 by ahorek's team
Post:
It's ultimately about the software. While you can compare the RTX 4060 and RX 9700 XT based on FLOPS in areas like FP32 for peak performance, the actual performance can vary depending on the specific workload.

you can check some compute benchmarks here
https://www.phoronix.com/review/amd-radeon-rx9070-linux-compute/3

Software and driver optimizations can certainly help, and the Asteroids GPU app currently doesn’t scale well on high-end GPUs due to memory bottlenecks rather then compute power, so there’s definitely room for improvement on the software side for both AMD/NVidia cards.
4) (Message 8975)
Posted 10 May 2025 by ahorek's team
Post:
Zen 4 offers some enhancements over Zen 3, higher clock speeds, a larger L2 cache, and support for AVX-512 (limited to 2×256-bit execution). The power efficiency is also better thanks to 7->5nm process node.

You can see up to 30% improvement in some apps, but in most cases, the gains are only a few percent, primarily due to the increased clock speeds. If you expected more, you'll be disappointed... The days when each new CPU architecture brought a 2× performance leap are long gone.
5) (Message 8969)
Posted 30 Apr 2025 by ahorek's team
Post:
> Can we get an application that makes use of a Nvidia GPU under ARM64 too?
Tegras were popular, but the platform is now outdated. It lacks OpenCL support, and NVIDIA doesn't provide cross-compilation tools for CUDA 10. It's possible to do, but the app would be restricted to just a couple of Tegra device models and would be difficult to maintain, as the framework is already EOL.

Keith tested it and it takes about 4 hours per work unit on a GPU:
https://asteroidsathome.net/boinc/result.php?resultid=566363744

We could add support for NVIDIA Orin devices, but there likely won't be an official app for Tegras. Sorry
6) (Message 8961)
Posted 24 Apr 2025 by ahorek's team
Post:
No, it hasn’t been dropped. We’ve transitioned to a single universal binary that detects available instructions at runtime and selects the optimal one based on your CPU’s capabilities.

You can verify which version was used by checking the task details, for example:
https://asteroidsathome.net/boinc/result.php?resultid=572657726
Using AVX512 SIMD optimizations.
7) (Message 8957)
Posted 18 Apr 2025 by ahorek's team
Post:
Ardeno lacks FP64 support, so double-precision calculations need to be emulated, which involves modifying the application. This will likely take considerable effort, and the GPU version will almost certainly be MUCH SLOWER than the CPU app.
8) (Message 8954)
Posted 16 Apr 2025 by ahorek's team
Post:
Only one, in other words, GPU and CPU tasks are the same.

Asteroids tasks are around 10 times faster on a reasonable GPU than a single CPU core, but since modern CPUs have many cores, using CPUs is currently more efficient overall.
Some applications can be 1000+ times faster on a GPU. The raw processing power is significantly higher, but writing software that fully leverages that power is also much more challenging.

While you can attempt to run more tasks in parallel on a GPU, it usually won't lead to any performance improvement. 2 tasks running in parallel on the same GPU will both just be 2 times slower.
9) (Message 8952)
Posted 16 Apr 2025 by ahorek's team
Post:
Is there a "rusticl.icd" file in your OpenCL vendor list?
ls /etc/OpenCL/vendors
ls /usr/share/OpenCL/vendors


1/ If the file is missing, your Mesa version may be compiled without rusticl support (it depends on the distribution, and I'm not familiar with the GNOME linux)
2/ you should at least see rusticl as a platform without available devices in case the GPU isn't supported

Unfortunately, AMD has opted not to support OpenCL on Linux for integrated GPUs like yours, and without functional drivers, OpenCL applications just won’t run.
10) (Message 8950)
Posted 15 Apr 2025 by ahorek's team
Post:
Verify if your GPU is supported by running clinfo (you might need to set environment variables to activate it). The output should look like this:

Number of platforms                               1
  Platform Name                                   rusticl
  Platform Vendor                                 Mesa/X.org
  Platform Version                                OpenCL 3.0
  Platform Profile                                FULL_PROFILE
  Platform Extensions                             cl_khr_icd...
  Platform Extensions with Version                cl_khr_icd                                                       0x400000 (1.0.0)
...
  Platform Numeric Version                        0xc00000 (3.0.0)
  Platform Extensions function suffix             MESA
  Platform Host timer resolution                  1ns

  Platform Name                                   rusticl
Number of devices                                 1
  Device Name                                     AMD Radeon Graphics (radeonsi, raphael_mendocino, LLVM 20.1.0, DRM 3.61, 6.14.1-1561.native)
  Device Vendor                                   AMD
  Device Vendor ID                                0x1002
  Device Version                                  OpenCL 3.0
  Device UUID                                     00000000-1300-0000-0000-000000000000
  Driver UUID                                     414d442d-4d45-5341-2d44-525600000000
  Valid Device LUID                               No
  Device LUID                                     0000-000000000000
  Device Node Mask                                0
  Device Numeric Version                          0xc00000 (3.0.0)
  Driver Version                                  25.1.0-devel
...


Primegrid and Einstein apps should work. Asteroids should be supported in theory, but there are some issues that need to be resolved first. Simply enabling it isn’t sufficient, as the app was either crashing or producing invalid results.
11) (Message 8948)
Posted 14 Apr 2025 by ahorek's team
Post:
btw RustiCL may already be supported on your system, you can check it:
export RUSTICL_ENABLE=radeonsi
export RUSTICL_FEATURES=fp64
clinfo


However, even if you enable it, the current Asteroids app won’t recognize it. The restriction will be removed in the next release, though there's still a low chance it will work correctly.
12) (Message 8947)
Posted 14 Apr 2025 by ahorek's team
Post:
That's because you're using Clover, which has already been deprecated. Those drivers were never usable for more than "detecting the GPU name"... https://www.phoronix.com/news/RadeonSI-Rusticl-Only

For OpenCL, you'll need a proper AMD driver (ROCM). Unfortunately, it seems that some iGPUs like AMD Ryzen™ 5 5600G are only supported on Windows https://www.amd.com/en/support/downloads/drivers.html/processors/ryzen/ryzen-5000-series/amd-ryzen-5-5600g.html

So, if you want to use the iGPU, you’ve only got two options: switch to Windows or use RustiCL, but RustiCL still has some issues, and even if it works, it could be slower than proprietary drivers.
The last time I checked, the Asteroids app didn’t run properly on my iGPU, but you’ll still have a much better shot at getting OpenCL things working on Linux with RustiCL than with Clover.
13) (Message 8943)
Posted 11 Apr 2025 by ahorek's team
Post:
Pushing the GPU beyond its limits may lead to GUI unresponsiveness, and a malfunctioning application could cause the system to freeze, and a hard reset will be the only option to recover it. Windows protection monitors your GPU and resets the driver if such issues occur, but resetting the GPU will cause the task to fail.
With very slow GPUs, you're more likely to encounter timeout crashes since the default timeout value is the same across all GPUs, regardless of their performance.

Ideally, the application should assess your GPU's capabilities and process smaller data chunks. This would improve GUI responsiveness and prevent timeouts, although it will also slow down the application. Each GPU model is unique, making it challenging to find the right balance that consistently works across all systems, and splitting work into smaller pieces isn't always as simple as it sounds.

Given that your GPU technically has the capability to run those tasks, it still processes them slower than most 20-year-old CPUs. If you're still set on using it on Windows, despite the insane inefficiency, increasing the timeout could be a reasonable workaround.
14) (Message 8938)
Posted 9 Apr 2025 by ahorek's team
Post:
hi, the GPU is extremely slow. Old architecture, 3 cores, and shared memory. The problem is that it triggers Windows protection against GPU freezes.

You can probably fix it by increasing the timeout
https://manual.notch.one/0.9.23/en/docs/faq/extending-gpu-timeout-detection/

but I would rather recommend not to use the GPU at all. Even the Bulldozer cores are more efficient, and utilizing the iGPU will just slow it down.
15) (Message 8936)
Posted 8 Apr 2025 by ahorek's team
Post:
> Can you use the IA tensor cores of nvidia to program ASIC of asteroids at home?

No, tensor cores do support low-precision data types like FP16, they're specifically built for AI workloads. The Asteroids app depends on FP64, and lower precisions aren't good enough. That’s also why none of the existing BOINC projects make use of tensor cores (or NPUs); they're not suitable for scientific computing. Like ASICs, tensor cores are highly specialized, great for AI, but pretty much useless for most scientific applications.
16) (Message 8931)
Posted 7 Apr 2025 by ahorek's team
Post:
Note that FP32 represents peak performance under ideal (unrealistic) conditions. Real-world applications are more complex than simply multiplying two numbers. As a result, newer architectures can outperform older GPUs in certain applications, even if their FP32 performance is lower because they can utilize available resources more efficiently.
The numbers can give you a rough performance estimate, but it's always best to test with the specific application you plan to run.
17) (Message 8930)
Posted 7 Apr 2025 by ahorek's team
Post:
PCIe bandwidth is mostly important for SSD performance or when games exceed available GPU memory. In typical scenarios, it's primarily used during data transfers to the GPU, like when loading a game level. Compute tasks, on the other hand, spend minimal time on data loading. Most of their time is spent processing data that's already resident in GPU memory. Faster is always better in theory, but if data transfers only account for 1% of the total workload, speeding them up even 10 times won’t make a noticeable difference.

what doesn't matter:
* pcie bandwidth
* tensor / AI / wmma - no boinc app utilize them
* FP16, INT4... - despite great perf numbers, no boinc app utilize them
* TMUs (TMU * clock speed = Texture Rate) - only relevant for games
* ROPs (ROP * clock speed = Pixel Rate) - only relevant for games
* VRAM capacity - A larger size won’t improve performance, and most GPUs have more than enough capacity to handle BOINC projects.

what matters:
* GPU vendor / architecture in general
* cores - but meaningful comparisons can only be made within the same architecture and vendor
* FP32
* FP64
* INT32
* cache
* memory bandwidth
* clock speed

Those are attributes you should be looking for, but just like with games, performance can vary. Some projects see greater gains from high clock speeds, while others rely more on memory bandwidth. It depends on the type of computations and how well the application is optimized.
18) (Message 8929)
Posted 7 Apr 2025 by ahorek's team
Post:
You can find the latest code in the "dev" branch.
period_search_optimization_simd (cpu)
period_search_opencl_amd (opencl)
period_search_cuda (cuda)

If you're trying to understand what the code does, it's easier to begin with the simpler version that doesn't include any SIMD or GPU optimizations.

There's no guide or comprehensive documentation, but you can find some resources about the math behind here
https://dspace.cuni.cz/bitstream/handle/20.500.11956/124674/130299758.pdf?sequence=1&isAllowed=y (czech only)
https://www.issfd.org/ISSFD_2014/ISSFD24_Paper_S10-3_Bradley.pdf

The app uses well-known algorithms like
https://en.wikipedia.org/wiki/Einstein_notation
https://en.wikipedia.org/wiki/Levenberg%E2%80%93Marquardt_algorithm
https://en.wikipedia.org/wiki/Gaussian_elimination

> i think this IA can write the code c... in python?
AI can explain what each part does and write a similar code in Python for educational purposes, but you still need some math+dev background to understand it.

> Can i see the code run with asembler or xdbg debugger
Sure, if you can compile it... However, the assembly output won't provide much insight into what the code does.
19) (Message 8928)
Posted 7 Apr 2025 by ahorek's team
Post:
Your GeForce RTX 5090 (Blackwell) is not supported yet, but it's planned for the next release.

NVIDIA RTX A4000 should work. CPU and GPU tasks are the same, you should be able to get tasks on both, as long as CPU/GPU tasks are enabled in the project preferences. Perhaps you already have enough GPU tasks from different projects, so the client isn't requesting new work. Check the event log.
20) (Message 8912)
Posted 4 Apr 2025 by ahorek's team
Post:
Ray tracing improvements aren’t relevant for crunching. The new generation is expected to be slightly more efficient, but I think the 9070 XT and 7900 XT should deliver similar performance.

Both cards should already be supported. Let us know once you have some results to share.


Next 20