Something changed


Message boards : Number crunching : Something changed

Message board moderation

To post messages, you must log in.
1 · 2 · Next
AuthorMessage
Steve Gaber

Send message
Joined: 7 Mar 14
Posts: 77
Credit: 6,499,233
RAC: 2,947
Message 8391 - Posted: 20 May 2024, 7:10:49 UTC
Until recently, my computer
(AMD Ryzen 7 5700G with Radeon Graphics [Family 25 Model 80 Stepping 0] (16 processors) AMD AMD Radeon(TM) Graphics (6227MB) OpenCL: 2.0 Microsoft Windows 11 Core x64 Edition, (10.00.22631.00)

was able to run 16 or 17 Asteroids simultaneously and not get over 175 degrees F. Now if I run more than four at once it overheats. I have CoreTemp set the max temp at 184F., after which it will shut down.

Right now it's running 17 Einstein tasks at `171 degrees F.

Asteroids sent me 75 tasks, which I have whittled down to 52.

I didn't make any changes as far as I know. Maybe I did something inadvertently.

So what do you think is happening?

Thanks for any suggestions.

S. Gaber
ID: 8391 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
ahorek's team
Volunteer developer
Volunteer tester

Send message
Joined: 1 Jan 13
Posts: 66
Credit: 8,343,053
RAC: 56,760
Message 8393 - Posted: 20 May 2024, 9:34:33 UTC
recent CPUs like your Zen 3 boost up higher and higher until they bump into the power, thermal, or frequency limits. Reaching 80C or 175F under heavy load is normal and by design.
unlike other apps like Einstein, Asteroids app heavily uses optimized CPU features like FMA and better CPU utilization means more heat.

I would play with the Bios / Ryzen Master... lowering power limits / disabling PBO could help significantly with temps without sacrificing too much performance.
Another factor is your integrated GPU. GPU apps are not very effective on this project and you'll likely slow down all your CPU tasks due to shared thermal limits...
ID: 8393 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Steve Gaber

Send message
Joined: 7 Mar 14
Posts: 77
Credit: 6,499,233
RAC: 2,947
Message 8394 - Posted: 21 May 2024, 4:32:58 UTC - in response to Message 8393.  

Last modified: 21 May 2024, 5:19:41 UTC
recent CPUs like your Zen 3 boost up higher and higher until they bump into the power, thermal, or frequency limits. Reaching 80C or 175F under heavy load is normal and by design.
unlike other apps like Einstein, Asteroids app heavily uses optimized CPU features like FMA and better CPU utilization means more heat.

I would play with the Bios / Ryzen Master... lowering power limits / disabling PBO could help significantly with temps without sacrificing too much performance.
Another factor is your integrated GPU. GPU apps are not very effective on this project and you'll likely slow down all your CPU tasks due to shared thermal limits...



Thanks for your reply.

It makes me nervous to play with the BIOS. I am not the most savvy compute user or BOINC cruncher. Although I did put this computer together from parts bought online. Mechanical assembly I can deal with. It's the software part I'm not so good at. But I will try it.

I still don't understand why this change occurred suddenly and spontaneously.

S. Gaber
ID: 8394 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Steve Gaber

Send message
Joined: 7 Mar 14
Posts: 77
Credit: 6,499,233
RAC: 2,947
Message 8395 - Posted: 21 May 2024, 8:11:13 UTC - in response to Message 8394.  

Last modified: 21 May 2024, 8:15:12 UTC
recent CPUs like your Zen 3 boost up higher and higher until they bump into the power, thermal, or frequency limits. Reaching 80C or 175F under heavy load is normal and by design.
unlike other apps like Einstein, Asteroids app heavily uses optimized CPU features like FMA and better CPU utilization means more heat.

I would play with the Bios / Ryzen Master... lowering power limits / disabling PBO could help significantly with temps without sacrificing too much performance.
Another factor is your integrated GPU. GPU apps are not very effective on this project and you'll likely slow down all your CPU tasks due to shared thermal limits...



Thanks for your reply.

It makes me nervous to play with the BIOS. I am not the most savvy compute user or BOINC cruncher. Although I did put this computer together from parts bought online. Mechanical assembly I can deal with. It's the software part I'm not so good at. But I will try it.

I still don't understand why this change occurred suddenly and spontaneously.

S. Gaber


Now it's only letting me run three Asteroids tasks at once. If I try four, it goes over the temperature limit.

There are 45 Asteroids tasks, all due by May 24. They probably won't all be completed by then running three ay a time.
S. Gaber
ID: 8395 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
tito

Send message
Joined: 22 Jul 13
Posts: 5
Credit: 24,685,715
RAC: 51,905
Message 8396 - Posted: 21 May 2024, 9:38:49 UTC - in response to Message 8395.  
So maybe problem is not with app, but with hardware? Problems with fans? Maybe to much dirt on fins?
Install HWMonitor, HWinfo or similar and check temps vs fan rpm
ID: 8396 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
ahorek's team
Volunteer developer
Volunteer tester

Send message
Joined: 1 Jan 13
Posts: 66
Credit: 8,343,053
RAC: 56,760
Message 8397 - Posted: 21 May 2024, 9:57:58 UTC
note that just a high temperature isn't a problem. Modern CPUs like yours are designed to keep the maximum temperature limit around 80-95°C under load. This is normal and it also shouldn't cause shutdowns or any issues.

if your PC with 8/16 cores can't sustain 3 tasks load, there's something wrong and you should really check your cooling solution. It's not a problem with the app.

you can configure the app to use a less optimized version by putting app_config.xml into your project's folder and then restart the client
<app_config>
  <app_version>
    <app_name>period_search</app_name>
    <plan_class></plan_class>
    <cmdline>--optimization 3</cmdline>
  </app_version>
</app_config>

this way the app won't hit the CPU that hard, but it'll also be slower = less effective. It's just a workaround and reducing the maximum TDP as I suggested before is usually a better choice.
ID: 8397 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Steve Gaber

Send message
Joined: 7 Mar 14
Posts: 77
Credit: 6,499,233
RAC: 2,947
Message 8398 - Posted: 21 May 2024, 20:02:46 UTC - in response to Message 8397.  

Last modified: 21 May 2024, 20:31:10 UTC
note that just a high temperature isn't a problem. Modern CPUs like yours are designed to keep the maximum temperature limit around 80-95°C under load. This is normal and it also shouldn't cause shutdowns or any issues.

if your PC with 8/16 cores can't sustain 3 tasks load, there's something wrong and you should really check your cooling solution. It's not a problem with the app.

you can configure the app to use a less optimized version by putting app_config.xml into your project's folder and then restart the client
<app_config>
  <app_version>
    <app_name>period_search</app_name>
    <plan_class></plan_class>
    <cmdline>--optimization 3</cmdline>
  </app_version>
</app_config>

this way the app won't hit the CPU that hard, but it'll also be slower = less effective. It's just a workaround and reducing the maximum TDP as I suggested before is usually a better choice.


But the three-task limit only pertains to Asteroids tasks. It will still run 17 Einstein tasks. And two weeks ago it would also run 17 Asteroids tasks at 171 degrees F. Now it runs three tasks at 174 F. With four tasks it will go to 182 degrees. with 5 tasks it will increase to 186 degrees and shut down.

Is running over 180 degrees damaging to the CPU?

???
S. Gaber.
ID: 8398 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
ahorek's team
Volunteer developer
Volunteer tester

Send message
Joined: 1 Jan 13
Posts: 66
Credit: 8,343,053
RAC: 56,760
Message 8399 - Posted: 21 May 2024, 21:51:05 UTC - in response to Message 8398.  
182°F == ~82°C is normal for a Zen 3 CPU like AMD Ryzen 7 5700G (under load)
here's a good explanation of why https://www.youtube.com/watch?v=h9TjJviotnI

but the temperature should be kept under the limit, shutting down the PC under load indicates a problem
I would start with monitoring software like https://www.hwinfo.com/download/ and see what happens. It could be a PSU or some faulty component, it's hard to say...

updating a bios could also sometimes help with stability issues
ID: 8399 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
ahorek's team
Volunteer developer
Volunteer tester

Send message
Joined: 1 Jan 13
Posts: 66
Credit: 8,343,053
RAC: 56,760
Message 8400 - Posted: 21 May 2024, 22:32:04 UTC
Einstein's apps are more memory intensive, they don't stress the CPU that hard. Different apps have different needs, but in general, running BOINC apps pushes a CPU to its limits (on purpose) more than regular apps and CPUs produce excessive heat under heavy load. It could potentially damage your hardware, but if there's something wrong, various protections will shut down the system before that.

for stability testing, try https://en.wikipedia.org/wiki/Prime95 which is one of the most CPU intensive app
ID: 8400 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Lamberto Vitali

Send message
Joined: 14 Jun 23
Posts: 85
Credit: 5,914
RAC: 0
Message 8401 - Posted: 21 May 2024, 22:35:42 UTC
I thought CPUs didn't shut off anymore? They just throttle. Even my 12 year old ones do this. Even if a fan fails they slow right down and sit at 95C. Something very weird happening if you're getting a shutoff.
ID: 8401 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
ahorek's team
Volunteer developer
Volunteer tester

Send message
Joined: 1 Jan 13
Posts: 66
Credit: 8,343,053
RAC: 56,760
Message 8403 - Posted: 21 May 2024, 22:48:26 UTC - in response to Message 8401.  

Last modified: 21 May 2024, 22:49:39 UTC
old CPUs may have a temp limit, my old Bulldozer shuts down with heavy overclocking at around 70°C. Current CPUs scale the frequency to keep the thermal limit. They still shut off if the temperature is critical like 110°C, but that should happen only if you run the CPU without a mounted cooler :)
ID: 8403 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Lamberto Vitali

Send message
Joined: 14 Jun 23
Posts: 85
Credit: 5,914
RAC: 0
Message 8404 - Posted: 21 May 2024, 22:58:39 UTC - in response to Message 8403.  
Surely his Ryzen 7 is new enough? It's not as old as my 12 year old things.
ID: 8404 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
ahorek's team
Volunteer developer
Volunteer tester

Send message
Joined: 1 Jan 13
Posts: 66
Credit: 8,343,053
RAC: 56,760
Message 8405 - Posted: 21 May 2024, 23:18:52 UTC - in response to Message 8404.  
yes, it's Zen 3 => Tmax 90°C and 95°C for Zen 4

usually, you can lower the limit in bios
ID: 8405 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Steve Gaber

Send message
Joined: 7 Mar 14
Posts: 77
Credit: 6,499,233
RAC: 2,947
Message 8406 - Posted: 21 May 2024, 23:20:04 UTC - in response to Message 8404.  
Surely his Ryzen 7 is new enough? It's not as old as my 12 year old things.


It's a little over one year old.
ID: 8406 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Steve Gaber

Send message
Joined: 7 Mar 14
Posts: 77
Credit: 6,499,233
RAC: 2,947
Message 8407 - Posted: 21 May 2024, 23:41:10 UTC - in response to Message 8396.  
So maybe problem is not with app, but with hardware? Problems with fans? Maybe to much dirt on fins?
Install HWMonitor, HWinfo or similar and check temps vs fan rpm


I clean the inside, power supply and fans once a month.

I would think It was a hardware or dirt problem it would affect every projects' tasks. Right now it only affects Asteroids.

I use CoreTemp to monitor temperatures.

Maybe I should disable the feature that sets the shutdown limit and let it run at 186 degrees.

You guys said running at that temperature would not harm the sensitive parts.

S. Gaber
ID: 8407 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
ahorek's team
Volunteer developer
Volunteer tester

Send message
Joined: 1 Jan 13
Posts: 66
Credit: 8,343,053
RAC: 56,760
Message 8408 - Posted: 22 May 2024, 0:03:25 UTC - in response to Message 8407.  
if you have software that deliberately shuts down the PC after reaching a certain CPU temperature, you should disable it, because it makes no sense these days. Your CPU knows what the safety limit is.
ID: 8408 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Keith Myers
Avatar

Send message
Joined: 16 Nov 22
Posts: 113
Credit: 91,707,192
RAC: 436,805
Message 8410 - Posted: 22 May 2024, 1:46:30 UTC
You should become familiar with your BIOS settings and use them for better control of how YOU want to use the PC, not the way that the motherboard manufacturer sets the BIOS defaults for max power consumption and max clocks for better gaming.

Default gaming setup is completely contrary to the best configuration for 24/7 distributed computing.

You should NOT be afraid of learning your BIOS. There are a multitude of YT videos explaining the basics of modern cpu BIOS setup.

I know that on my Asus desktop motherboards for Zen 2 and Zen 3 processors that unless I disable monitoring of the cpu temp in the BIOS by setting it to IGNORE, that the cpu will reboot if the temp hits 90° C. It WON'T throttle back, it just reboots. That may be just with Asus boards though as I've never used any other brand.

A proud member of the OFA (Old Farts Association)
ID: 8410 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Lamberto Vitali

Send message
Joined: 14 Jun 23
Posts: 85
Credit: 5,914
RAC: 0
Message 8411 - Posted: 22 May 2024, 10:18:47 UTC
True, motherboards have weird default settings. The CPU by itself, no matter what you tell the BIOS, will limit the clock speed to protect itself. It's hard wired and you cannot stop it, Intel and AMD don't want them frying under warranty. That's the only limit you need. If you see it stuck on this limit, you know you should cool more to get more clock speed.
ID: 8411 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
ahorek's team
Volunteer developer
Volunteer tester

Send message
Joined: 1 Jan 13
Posts: 66
Credit: 8,343,053
RAC: 56,760
Message 8413 - Posted: 22 May 2024, 11:12:30 UTC
motherboard manufacturer sets the BIOS defaults for max power consumption and max clocks for better gaming.

I second that. There's a difference between loading a game as fast as possible where the maximum performance is needed for a short time and 24/7 computing. Setting up lower TDP limits or enabling ECO mode could significantly help reduce heat without sacrificing too much performance.
ID: 8413 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Keith Myers
Avatar

Send message
Joined: 16 Nov 22
Posts: 113
Credit: 91,707,192
RAC: 436,805
Message 8418 - Posted: 22 May 2024, 15:59:18 UTC - in response to Message 8411.  
True, motherboards have weird default settings. The CPU by itself, no matter what you tell the BIOS, will limit the clock speed to protect itself. It's hard wired and you cannot stop it, Intel and AMD don't want them frying under warranty. That's the only limit you need. If you see it stuck on this limit, you know you should cool more to get more clock speed.


I use custom loop cooling with a thick 360mm radiator on max fans all the time. It has sufficient cooling. I just have a very high ambient room temperature at all times with 5 multi-gpu hosts running all the time. I can normally keep this host in the middle 80's.

But the idiosyncrasy of my particular motherboard BIOS cpu monitor setting is well known by users in not letting the cpu control its thermals itself, it just preempts that on its own at 90° C.

Again, the matter of motherboard manufacturers thinking they know better than to just use the cpu manufacturer recommended board settings to vendors.

A proud member of the OFA (Old Farts Association)
ID: 8418 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
1 · 2 · Next

Message boards : Number crunching : Something changed