dont_use_dcf active at SETI main

Started by William, June 12, 2012, 11:12:11 PM

Previous topic - Next topic
June 16, 2012, 02:01:09 AM #20 Last Edit: June 17, 2012, 09:22:12 PM by Barry
Is this righting itself with Claggy's & Barry's recommendations etc ? or still lots of cache to burn through ?

Jason
It is not the strongest of the species that survives, nor the most intelligent that survives. It is the one that is the most adaptable to change.
Charles Darwin
---
Chaos: When the present determines the future, but the approximate present does not approximately determine the future.
Edward Lorenz

The 460 is still at 356 tasks and the 560 has 1134.

So there is a ways to go yet.

Both machines still have a 0.01 dcf.

Is anybody else running a V7 boinc on SETI beta under anonymous platform?

I'm seeing a bug - it's not picking up APR after the 10th valid here.
Can anybody confirm or dispute (?) this?
I'm doing an initial bugrep to David,  but I'd like to know if it's just me...
I doubt it, but you never know, if the next task that should have been APR estimated, was just send too early after the 10th validation came through.
A person who won't read has no advantage over one who can't read. (Mark Twain)

Quote from: Barry on June 21, 2012, 12:04:29 AM
Is anybody else running a V7 boinc on SETI beta under anonymous platform?

Not at the moment, but it would only take a few moments to set one running - when it would reach the 10th validation would be subject to the availability of wingmates.

Quote from: Richard Haselgrove on June 21, 2012, 12:38:44 AM
Quote from: Barry on June 21, 2012, 12:04:29 AM
Is anybody else running a V7 boinc on SETI beta under anonymous platform?

Not at the moment, but it would only take a few moments to set one running - when it would reach the 10th validation would be subject to the availability of wingmates.

that would be helpful thanks. Maybe it just needs 11 validations and not 10.
Validating on beta is always extremely slow.
I'm doing the report - if it's just a case of >10 insted of the >=10 as we have been thinking David should be telling me...
A person who won't read has no advantage over one who can't read. (Mark Twain)

Had a quick look at the server trunk code (at least up until move to GIT)

You'll need 10 results complete for the app, AND 10 consecutive valids as well, before it'll kick it credit  / estimate scaling.

There are some 'interesting' kludges in the same scheduler file (credit.cpp) including one that sticks out as a bit weird... namely use of a random number generator to determine if a host is 'trusted' under certain cases of single replication (which we don't use at this stage) ... might look at this file a bit closer when I get time.
It is not the strongest of the species that survives, nor the most intelligent that survives. It is the one that is the most adaptable to change.
Charles Darwin
---
Chaos: When the present determines the future, but the approximate present does not approximately determine the future.
Edward Lorenz

I have ten valid and 13 consequitive valid :P

x41x - that tells you that 'valid' actually only counts the validations that have gone into APR (i.e. without the -9 in this case)
A person who won't read has no advantage over one who can't read. (Mark Twain)

June 21, 2012, 02:41:10 AM #27 Last Edit: June 21, 2012, 02:46:56 AM by Jason G
Correct, it only stores the calculated scaling if it all passes sanity checks & isn;t regarded as a runtime outlier:
Quoteif (!r.runtime_outlier && is_pfc_sane(x, wu, app)) {
            avp->pfc_samples.push_back(x);
        }

[Edit:] if looking at it with a view to optimisation, you'd probably not bother doing all the scaling factor calculation if you already know it's a runtime outlier, but hey, at least it's avoiding some unnecessary sanity checks for overflows with this.
It is not the strongest of the species that survives, nor the most intelligent that survives. It is the one that is the most adaptable to change.
Charles Darwin
---
Chaos: When the present determines the future, but the approximate present does not approximately determine the future.
Edward Lorenz

June 21, 2012, 02:55:18 AM #28 Last Edit: June 21, 2012, 03:06:57 AM by Jason G
Walked through again,
  yeah, you'll need 10 pre-existing, and 10 or more consecutive valid, (that went right through to APR),

So the scaling should start with the 11th consecutive valid that isn't a runtime outlier.

[Edit:]  Hmmm, I'll have to look at that runtime outlier code too, when I can.... my modded client 'let's go' of its own breed of suspected runtime outlier thing after about 5 consecutive 90% overestimates... If the server code doesn't let go there could be some stuck hosts that never scale, especially on major hardware + app upgrades... will have to walk through that logic to make sure it's got some sort of escape hatch as well (which can change the logic as to when scaling kicks in, or inhibit it completely .... i.e. won't necessarily start scaling on the 11th consecutive valid).
It is not the strongest of the species that survives, nor the most intelligent that survives. It is the one that is the most adaptable to change.
Charles Darwin
---
Chaos: When the present determines the future, but the approximate present does not approximately determine the future.
Edward Lorenz

having just looked through the validator code,  the only time a runtime outlier is marked, is when "result_overflow" appears in the stderr text ... so watch out for those valid overflow results with truncated or missing stderr ... they'll scale APR.
It is not the strongest of the species that survives, nor the most intelligent that survives. It is the one that is the most adaptable to change.
Charles Darwin
---
Chaos: When the present determines the future, but the approximate present does not approximately determine the future.
Edward Lorenz

Quote from: Jason G on June 21, 2012, 04:02:19 AM
having just looked through the validator code,  the only time a runtime outlier is marked, is when "result_overflow" appears in the stderr text ... so watch out for those valid overflow results with truncated or missing stderr ... they'll scale APR.

When the focus shifts to S@H v7 we could suggest Eric implement a check on the result file rather than totally relying on stderr. It wouldn't be particularly difficult, every task which exits at completion puts a best_gaussian in the result (though it may just have initialization values), but a result_overflow does not report any best_xxxxx signals. The best_autocorr, best_pulse or best_spike are not guaranteed to be present for a normal exit, though if not it would be strong evidence of a corrupted WU, and best_triplet is only there when there's a reportable triplet.

Another idea would be to get David to think again about adding code to make sure the stderr is properly handled. Last January one of the Milkyway@home devs submitted a patch, but was unable to convince David it was needed.
                                                                                 Joe

What I'll do at some point (pretty busy at this time with work) is ask for permission from Eric, then ask Claggy and Arkayn, or someone skilled like that, to help test some possible vulnerabilities in that mechanism.  That might help prompt a better look down the road as V7 & GBT get closer to coming online.

Jason
It is not the strongest of the species that survives, nor the most intelligent that survives. It is the one that is the most adaptable to change.
Charles Darwin
---
Chaos: When the present determines the future, but the approximate present does not approximately determine the future.
Edward Lorenz

It appears that the latest attempts the fix the scheduler on the VLAR to NV issue have resulted in 'dont_use_dcf' getting reactivated.
A person who won't read has no advantage over one who can't read. (Mark Twain)

Quote from: Barry on August 14, 2012, 07:54:54 PM
It appears that the latest attempts the fix the scheduler on the VLAR to NV issue have resulted in 'dont_use_dcf' getting reactivated.

Which will be good for long term crunchers indeed... but my intuition based assessment suggests that 90% of users attach, experience problems, then abandon the work... so when I can I will reassess the move of modified Boinc from aDCFs to adaptive flops as Joe suggested a while back, embedding into newer builds, and a form that could be suitable for stock Boinc... well see, needs another look after x41z is hammered flat, at least in third party form.  Probably won't be waiting for Beta project anymore, and move on to quick XAK & XAP test builds on the weekends.
It is not the strongest of the species that survives, nor the most intelligent that survives. It is the one that is the most adaptable to change.
Charles Darwin
---
Chaos: When the present determines the future, but the approximate present does not approximately determine the future.
Edward Lorenz

Powered by EzPortal