Posts by JStateson

1) (Message 8059)
Posted 21 Sep 2023 by Profile JStateson
Post:
Seems to be working fine for
s9000 and s9050: 3 validated, 8 pending
Radeon VII: 3 validation pending
2) (Message 7730)
Posted 9 Feb 2023 by Profile JStateson
Post:
Hi Keith!

I am making some progress. I found the windows version of nvidia-smi and it shows one of my GPUs is "lost"

C:\Program Files\NVIDIA Corporation>nvidia-smi
Unable to determine the device handle for GPU0000:02:00.0: GPU is lost.  Reboot the system to recover this GPU


I have removed that GPU. There was no indication from the windows device manager of any problem
3) (Message 7728)
Posted 8 Feb 2023 by Profile JStateson
Post:
On two systems, I have several Nvidia boards GTX-1060 (3 & 6 gb), 1660, 1070, p102-100 and occassionally a work unit fails to run with the message (for example)

<core_client_version>7.21.0</core_client_version>
<![CDATA[
<message>
The system cannot find the file specified.
 (0x2) - exit code 2 (0x2)</message>
<stderr_txt>

Error: Number of lc points is greater than POINTS_MAX = 1000
</stderr_txt>
]]>


I have no idea which board had the problem.
4) (Message 7191)
Posted 20 Apr 2021 by Profile JStateson
Post:
This is getting worse. I had 3 tasks in a row (win10x64) that showed %3 GPU usage on gpuz.

I stopped and restarted boinc to see if a reset of the GPUs helped. All that did was set the expected time to complete from 3 months down to a few minutes. The GPU, 1070Ti continued to run at %3 with completion time increasing.

I then aborted all tasks that used only %3 gpu resources. The 4th tasks ran OK. It is using %99 like it should.

I checked my wingmen on the 5 tasks that had errors and they reported did not see a problem like mine. However, they were running linux cuda or the cpu app. No wingmen were running win10 cuda.

I have 9 more tasks queued up. If I continue to have problems will switch back to einstein: 500+ works units with no error.
5) (Message 7190)
Posted 19 Apr 2021 by Profile JStateson
Post:
On a (very) few occasions I end up having to abort a task that has run for over 24 hours and has like 3 months left but the due date is less than a week.

When looking in the event viewer I once saw a "radar pre-leak" warning for one of the period search apps. Possibly these are related. Only 5 bad tasks out of l000 is not a problem. What is a problem is one that can get stuck a long time as they are hard to spot.
6) (Message 6431)
Posted 20 Jan 2020 by Profile JStateson
Post:
The GTX 1660 Ti would be in more people’s reach. l ask that we get the app updated to support the Turing based cards please. Hopefully it would just be a matter of recompiling it with the latest CUDA compiler but I bet it’s not that simple.


My 1080Ti, 1060 and 760 gpu's work just fine here, about 11 minutes per wu for the 1080Ti and about 47 minutes for the 760.


Your RAC here is 0. Consider giving up collatz. It is more likely an asteroid will hit us and stop "collatz" before collatz will find an exception to 3p+1, publish it, and stop.

I am back crunching but will pack up and leave if they get on the GRIDCOIN graylist or blacklist for not having work.

Yea, they fixed the same problem back 3 years ago. Maybe they kept notes on how it was done.
https://www.reddit.com/r/BOINC/comments/4wi1sf/astroidshome_wu_error_gtx_1080/
The following is a mix of one gtx1070, one 1070Ti, and three P102-100
       Run Time     CPU Time     Credit
         (sec)         (sec)
           660.3         658.4        480.0
           264.3         261.1        480.0
           205.1         203.1        480.0
           651.4         647.6        480.0
            23.2          21.1        480.0
           876.6         871.1        480.0
           675.5         672.6        480.0
           647.2         643.2        480.0
           138.7         135.4        480.0
          1007.9        1004.4        480.0
           680.3         675.9        480.0
           675.9         673.5        480.0
           141.6         139.8        480.0
           656.2         654.3        480.0
           794.1         789.6        480.0
           641.1         639.1        480.0
           656.2         654.8        480.0
           907.4         904.2        480.0
           593.9         591.7        480.0
           652.2         650.4        480.0
         ----------------------------------

Avg:       577.4         574.6        480.0
STD:       267.1         266.6          0.0

1.20 seconds per credit from above info one device
0.8312 Credits per second for one device
Times shown above were divided by number of concurrent tasks(1)
2,992 number of credits in an hour this system
7) (Message 6429)
Posted 20 Jan 2020 by Profile JStateson
Post:
I tried substituting libcudart.so.10.2 for the older 5.5 but it did not work even after telling BOINC not to check the file size or do a checksum test.

I had actually tried this approach on the Milkyway project app and managed to get a different library to be used. It didn't make any difference in performance but at least it ran.

Please go over here
https://askubuntu.com/questions/1204434/cuda-backwards-capability
and bump up my question.

Maybe there is a way to fool the Asteroids app into using the newer library. Seems to me the app should work with newer lib even if it does not take advantage of new performances in the device.
8) (Message 6428)
Posted 19 Jan 2020 by Profile JStateson
Post:
I don't see the CUDA source anywhere. I have had success linking SETI with later (10.2) CUDA libs but need "something" other than that 0.2.1 database whatever.
9) (Message 6427)
Posted 19 Jan 2020 by Profile JStateson
Post:
Yea, just discovered the problem. My gtx 1660ti error'ed out 50+ tasks within seconds of attaching the project.
10) (Message 6426)
Posted 19 Jan 2020 by Profile JStateson
Post:
Yea, just discovered the problem. My gtx 1660ti error'ed out 50+ tasks within seconds of attaching the project.

must have double clicked, sorry, delete one of these assume any moderators or principals care.
11) (Message 6164)
Posted 31 Jan 2019 by Profile JStateson
Post:
I had this happen on a pair of HD7950. Your situation may be different but what happened was occasionally there was an ATI kernel reset and one of the HD boards did not recover. It was running at minimum freq according to gpuz. I discovered that I could suspend the task and then immediately resume it and that would fix the board that was not running at full speed.

At one time I looked into doing this automatically but I came up with a better cooling arrangement and the kernel resets stopped.
12) (Message 6163)
Posted 31 Jan 2019 by Profile JStateson
Post:
Found a way to, at least, stop long tasks from executing. I set a rule in BoincTasks to suspend any Asteroids GPU task if it takes over 1 hour and 30 minutes, the "0d,01:30:00" (I show 08:00 in below pic as I did not want to wait that long). The rule, as shown below, would be "After 1:30 hours, wait 10 seconds, then suspend the task for 99 days"

This allows other tasks to use the GPU. Note that the task status is "suspendedby user" and the debug log shows the rules was executed. This rule was applied to a remote computer, ms-7593-1060, which is nice.

HTH

13) (Message 6162)
Posted 31 Jan 2019 by Profile JStateson
Post:
Same problem here. Aborted a 21 hour task that should have taken 21 minutes. I looked at my wingman here and observed that the same task was automatically pre-empted "EXIT_TIME_LIMIT_EXCEEDED" after 10 days. Truly a waste of computer power. I respectfully request that the "EXIT_TIME_LIMIT_EXCEEDED" parameter be set to some reasonable value. I could not find that option under preferences, I assume it is coded into the program. Suggest an hour or 2.
14) (Message 2484)
Posted 1 Feb 2014 by Profile JStateson
Post:
Completed my first pair of Android tasks under nativeboinc, both valid. I first tried the original boinc but the manager was too buggy and I had to uninstall.

Shown below is a screen grab of my Galaxy IIIs and a, somewhat, comparable EeePC with an n570 cpu. That win7x32 system is significantly slower. It should finish in about 8 more hours and I will try disabling hyperthread for the next asteroids run which should speed it up. As it is, my Galaxy IIIs Sprint phone is superior to an intel n570 dual core in an Asus netbook.

15) (Message 2472)
Posted 30 Jan 2014 by Profile JStateson
Post:
follow-up on my post of problems earlier: I uninstalled boinc from Berkeley and put in the "native boinc" and that seems to be working and I have 2 asteroids tasks running. Will see how they turn out. Unfortunately, the boinc I downloaded earlier was the "official" one which seems to be buggy.
16) (Message 2468)
Posted 30 Jan 2014 by Profile JStateson
Post:
(Dagorath)
Also, Asteroid tasks use DP (double precision) calculations. DP takes a lot of time. A GTX 630 is very slow on DP calcs. My GTX 670 does an Asteroids task in about 45 minutes, faster than a 630 because it has more "DP power", but still not extremely fast compared to the CPU app. The only NVIDIA cards that will be extremely fast compared to the CPU app will be the cards that have good DP power which means the Titan and certain Teslas. 670 and 680 cards that have been hacked to unleash their full DP capability should perform close to Titan and Tesla but so far nobody has tried the hack and reported it here, unless I missed it.


This discussion interests me because I have both a 570 and a 670 and noticed that the 570 performed better but with higher heat. I was not aware of how bad the DP had been crippled until reading about it here. I found a discussion about the mod to the 690 (and other nVidia) to change them into their professional equivalent. Years ago I had modded an Athlon mobile (also "xp") to change them into the multiprocessor equivalent using silver ink and scratching out a trace on the cpu so this NVidia mod interested me. I did read where the author burned out his gtx690 but it was not on account of the mod he was making. Anyway, after reading through most of the 50+ pages, it appears my gtx670 can be modded into a k2 grid but there is no performance gain as shown on some "spec" program that had DP performance as one of its tests. The success seems to be the gain in virtualization for gaming which does not interest me. I had a bad experience replacing an "0402" surface mount resistor and do not want to try it again. However, if it is a larger resistor and on the back side of the card then I might consider trying it.
17) (Message 2467)
Posted 30 Jan 2014 by Profile JStateson
Post:
After reading this thread, I installed nativeboinc on my android (Galaxy 3 S) from the play store and connected to asteroidsathome. There seems to be some real problems with nativeboinc. An asteroid task downloaded and started executing and was checkpointing about every 60 seconds but there was no indication on the status screen of anything going on other than
"suspended until battery is over %90: currently %100"
and "waiting project initialization" on the project page even though the task had been running for many hours.
I tried suspending the project to see if the dialog button would change to "resume" but it didn't and boinc is currently stuck "reading projects" and a full reset failed to fix the problem. The event log shows null pointers when accessing the status page and I will be uninstalling the app as it seems too many problems too soon for my android which has 2 cores and 2gb memory (the sprint version of the IIIs)