dont_use_dcf active at SETI main

Started by William, June 12, 2012, 11:12:11 PM

Previous topic - Next topic
Attention all SETI users:

as of the maintenance on Tuesday 12th June 2012 <dont_use_dcf/> is active on SETI main.

That means BOINC 7 clients are instructed to ignore DCF. To be fair since APR works now again, DCF isn't really needed.

The only problem with it is that the 'recommended' 7.0.25 client has a known bug that causes tasks to run in High Priority mode (HP/EDF) all the time.

So if you are on a 7.0.25 client upgrade to 7.0.28 ASAP.
A person who won't read has no advantage over one who can't read. (Mark Twain)

I still have flops set in my app_infos, but I am using 7.0.28 on the FX4100/GTX460.

The i3/GTX560 is still running 6.12.43.

June 12, 2012, 11:22:01 PM #2 Last Edit: June 12, 2012, 11:54:52 PM by Jason G
This is truly psychotic.


Quote from: Barry on June 12, 2012, 11:12:11 PM... To be fair since APR works now again, DCF isn't really needed. ...

I better point out that this assertion isn't strictly true, and is one I've heard mentioned by Joe Segur, and I think David in changesets.

APR controls long term general estimates, but provides no short term facility for tracking machine hardware or performance changes, therefore relying on APR alone will result in headaches for those up (or down)-grading.

In Engineering control theory, cascade control systems  in distinct transient domains  (as DCF with single app + APR were ) are perfectly fine, provided they are operational, contain no faulty heuristics/limits that unduly influence destablise normal operation,  and one is intended for quick response & the other slower.

Removing the faulty DCF thing is probably good, given that it doesn't cope with multiple applications well, but APR can't cope gracefully with most of the things DCF did well... So the net effect is we lose the 'good parts' of DCF & gain wonky estimates that take even longer to correct when hardware (or software) changes.

Jason
It is not the strongest of the species that survives, nor the most intelligent that survives. It is the one that is the most adaptable to change.
Charles Darwin
---
Chaos: When the present determines the future, but the approximate present does not approximately determine the future.
Edward Lorenz

June 12, 2012, 11:25:10 PM #3 Last Edit: June 12, 2012, 11:27:54 PM by Barry
Don't shoot the messenger. But I can amend the 'isn't really needed' later.

edit: amended in the onlyplace that has a timelimit on edit ::)
A person who won't read has no advantage over one who can't read. (Mark Twain)

June 12, 2012, 11:44:20 PM #4 Last Edit: June 13, 2012, 12:29:25 AM by Jason G
Thanks! that'll do quite well.

[Edit:] hmm, I left one out that I can think of now.  Using the machine for an extended period while crunching could be a problem for some as well.
It is not the strongest of the species that survives, nor the most intelligent that survives. It is the one that is the most adaptable to change.
Charles Darwin
---
Chaos: When the present determines the future, but the approximate present does not approximately determine the future.
Edward Lorenz

This particular bug is really only a big issue for people who run multiple BOINC projects on the same hardware.

If you run SETI only, the only difference you'll see is that your tasks run in 'Earliest Deadline First' mode. That is marked as 'high priority' in BOINC Manager's display, but it isn't the sort of raised priority which places any extra thermal stress on your CPU or GPU.

If you run other projects alongside SETI, and you're running v7.0.25, you may find that SETI appears to monopolise the CPU or GPU, and not allow other projects to take their turn at the trough. I think that's a big-enough show-stopper for BOINC as a whole that we should be able to get it turned off within a very few hours.

Sounds reasonable.  Given that the current recommended Boinc client is 7.0.25, I had been looking at migrating my mods to newer than 6.10.58/60... I'll think I'll put that idea off for revisit once 7 series is stable-ish.
It is not the strongest of the species that survives, nor the most intelligent that survives. It is the one that is the most adaptable to change.
Charles Darwin
---
Chaos: When the present determines the future, but the approximate present does not approximately determine the future.
Edward Lorenz

v7.0.28 is pretty good, and far more eligible for 'recommended' status than v7.0.25 ever was. There's been very little client development work since v7.0.28 was tagged - they're mainly doing server, API, and git work.

Quote from: Richard Haselgrove on June 13, 2012, 01:22:28 AM
v7.0.28 is pretty good, and far more eligible for 'recommended' status than v7.0.25 ever was. There's been very little client development work since v7.0.28 was tagged - they're mainly doing server, API, and git work.
OK.  the API is the one that mostly concerns my end for the apps themselves, since I've never been able to use it in unmodified form, even for older CPU apps.  Probably I'll assess the newer client once I have the chance to examine the requirements for a v6<->v7 data pump.
It is not the strongest of the species that survives, nor the most intelligent that survives. It is the one that is the most adaptable to change.
Charles Darwin
---
Chaos: When the present determines the future, but the approximate present does not approximately determine the future.
Edward Lorenz

Fair enough. Your call whether you follow the 'workround' or the 'fix at source' route.

Quote from: Richard Haselgrove on June 13, 2012, 01:40:36 AM
Fair enough. Your call whether you follow the 'workround' or the 'fix at source' route.
For the time being 'workaround' as has been, since there are loose threads still to yank out with seti apps.  Once those are completely unravelled though, gutting Boinc sources for the root design issues becomes more viable.
It is not the strongest of the species that survives, nor the most intelligent that survives. It is the one that is the most adaptable to change.
Charles Darwin
---
Chaos: When the present determines the future, but the approximate present does not approximately determine the future.
Edward Lorenz

Quote from: Jason G on June 12, 2012, 11:22:01 PM
Quote from: Barry on June 12, 2012, 11:12:11 PM... To be fair since APR works now again, DCF isn't really needed. ...

I better point out that this assertion isn't strictly true, and is one I've heard mentioned by Joe Segur, and I think David in changesets.

APR controls long term general estimates, but provides no short term facility for tracking machine hardware or performance changes, therefore relying on APR alone will result in headaches for those up (or down)-grading.
...
Jason

It "works" in the sense that a carpenter's clawhammer can be used to drive tacks or railroad spikes, though not as efficiently as the common nails it's designed for.

For the ~50% of hosts attached to S@H which do 1 task a day or less, APR is truly a long-term average and the faster adaptation of DCF was probably better. For a top 500 host which does over 200 tasks a day with an application, APR isn't what I'd call long-term. If there weren't delays caused by cache and whatever fraction of reported tasks have to wait for a wingmate's return, APR would adapt to changes within 1.5 days for such hosts. For hosts running stock, cache doesn't have any effect since the feedback is in the form of <flops> adjustment.

There are certainly conditions where it would be better to reset APR and let it do the early quick building of a new average, for instance if a high end GPU dies and an older one is inserted as a temporary replacement. It's awkward that the only way to accomplish that now is to force a new hostID.
                                                                              Joe

An 'at source' fix is under way - dont_use_dcf is being removed from SETI as we speak.

Don't feel under any pressure to upgrade from v7.0.25 just because of this - although v7.0.28 is undeniably better, and "should have already been able to be promoted to a public release before now" (Rom Walton)  ;) ::) ;D

June 13, 2012, 04:38:08 PM #13 Last Edit: June 13, 2012, 04:41:14 PM by Jason G
Phew! that 7.0.25 thing was looking like a forum thread nightmare, glad that's averted  :)

With DCFs, I guess I'm spoiled by having more or less real-time tracking during machine usage on a per app basis, which 'feels' 'more right' on the faster GPUs.  Someday, after the apps are deemed tidy enough, I will end up transferring that operation to local <flops> adjustment, with a few other extensions (toward mixed devices)... back to v7 multibeam for now  :)

Jason
It is not the strongest of the species that survives, nor the most intelligent that survives. It is the one that is the most adaptable to change.
Charles Darwin
---
Chaos: When the present determines the future, but the approximate present does not approximately determine the future.
Edward Lorenz

Both of my machines are out of high-priority mode, but they have a dcf of 0.01.

Quote from: arkayn on June 14, 2012, 01:45:43 AM
Both of my machines are out of high-priority mode, but they have a dcf of 0.01.

?! the 6.12.34 shouldn't have been affected by the dont_use_dcf? and the other is running 7.0.27 (why not .28?)
A person who won't read has no advantage over one who can't read. (Mark Twain)

June 14, 2012, 02:22:56 AM #16 Last Edit: June 14, 2012, 02:33:01 AM by Claggy
Quote from: Barry on June 14, 2012, 01:52:06 AM
Quote from: arkayn on June 14, 2012, 01:45:43 AM
Both of my machines are out of high-priority mode, but they have a dcf of 0.01.

?! the 6.12.34 shouldn't have been affected by the dont_use_dcf? and the other is running 7.0.27 (why not .28?)
Or even 6.12.43, that host only has ~1600 Seti tasks, does it also have tasks from other projects with short deadlines?

and/or a large 'Maintain enough tasks to keep busy for at least' value in it's preferences?

if those two hosts are sharing the same venue, you might want to split them so they use preferences suited to Boinc 6 or Boinc 7,

Claggy

June 14, 2012, 02:28:22 AM #17 Last Edit: June 14, 2012, 02:31:25 AM by arkayn
Quote from: Barry on June 14, 2012, 01:52:06 AM
Quote from: arkayn on June 14, 2012, 01:45:43 AM
Both of my machines are out of high-priority mode, but they have a dcf of 0.01.

?! the 6.12.34 shouldn't have been affected by the dont_use_dcf? and the other is running 7.0.27 (why not .28?)

It was not the dont_use_dcf that caused it, but the removal of the <flops> from my app_info.

I have just not gotten around to updating to .28 yet, and the other machines are running 6.12.43.

Quote from: Claggy on June 14, 2012, 02:22:56 AM

Or even 6.12.43, that host only has ~1600 Seti tasks, does it also have tasks from other projects with short deadlines?

Claggy

I have NNT set for a little while to let my systems stabilize.

Quote from: arkayn on June 14, 2012, 02:28:22 AM
It was not the dont_use_dcf that caused it, but the removal of the <flops> from my app_info.

I have just not gotten around to updating to .28 yet, and the other machines are running 6.12.43.

you have set nnt? as soon as the cache is empty, edit dcf to 1 before you let new tasks in.
A person who won't read has no advantage over one who can't read. (Mark Twain)

June 14, 2012, 04:04:32 AM #19 Last Edit: June 14, 2012, 05:14:23 AM by arkayn
Quote from: Barry on June 14, 2012, 03:35:53 AM
Quote from: arkayn on June 14, 2012, 02:28:22 AM
It was not the dont_use_dcf that caused it, but the removal of the <flops> from my app_info.

I have just not gotten around to updating to .28 yet, and the other machines are running 6.12.43.

you have set nnt? as soon as the cache is empty, edit dcf to 1 before you let new tasks in.

Yes it is on NNT.

I also just suspended my 9 AP GPU tasks as they would most likely end up with a -177 error with the current dcf.

Powered by EzPortal