dont_use_dcf active at SETI main

Jason G · June 16, 2012, 02:01:09 AM

Is this righting itself with Claggy's & Barry's recommendations etc ? or still lots of cache to burn through ?

Jason

arkayn · June 16, 2012, 04:38:16 AM

The 460 is still at 356 tasks and the 560 has 1134.

So there is a ways to go yet.

Both machines still have a 0.01 dcf.

William · June 21, 2012, 12:04:29 AM

Is anybody else running a V7 boinc on SETI beta under anonymous platform?

I'm seeing a bug - it's not picking up APR after the 10th valid here.
Can anybody confirm or dispute (?) this?
I'm doing an initial bugrep to David, but I'd like to know if it's just me...
I doubt it, but you never know, if the next task that should have been APR estimated, was just send too early after the 10th validation came through.

Richard Haselgrove · June 21, 2012, 12:38:44 AM

Quote from: Barry on June 21, 2012, 12:04:29 AM
Is anybody else running a V7 boinc on SETI beta under anonymous platform?

Not at the moment, but it would only take a few moments to set one running - when it would reach the 10th validation would be subject to the availability of wingmates.

William · June 21, 2012, 01:25:38 AM

Quote from: Richard Haselgrove on June 21, 2012, 12:38:44 AM
Quote from: Barry on June 21, 2012, 12:04:29 AM
Is anybody else running a V7 boinc on SETI beta under anonymous platform?

Not at the moment, but it would only take a few moments to set one running - when it would reach the 10th validation would be subject to the availability of wingmates.

that would be helpful thanks. Maybe it just needs 11 validations and not 10.
Validating on beta is always extremely slow.
I'm doing the report - if it's just a case of >10 insted of the >=10 as we have been thinking David should be telling me...

Jason G · June 21, 2012, 02:17:20 AM

Had a quick look at the server trunk code (at least up until move to GIT)

You'll need 10 results complete for the app, AND 10 consecutive valids as well, before it'll kick it credit / estimate scaling.

There are some 'interesting' kludges in the same scheduler file (credit.cpp) including one that sticks out as a bit weird... namely use of a random number generator to determine if a host is 'trusted' under certain cases of single replication (which we don't use at this stage) ... might look at this file a bit closer when I get time.

William · June 21, 2012, 02:34:15 AM

I have ten valid and 13 consequitive valid

x41x - that tells you that 'valid' actually only counts the validations that have gone into APR (i.e. without the -9 in this case)

Jason G · June 21, 2012, 02:41:10 AM

Correct, it only stores the calculated scaling if it all passes sanity checks & isn;t regarded as a runtime outlier:

Quoteif (!r.runtime_outlier && is_pfc_sane(x, wu, app)) {
avp->pfc_samples.push_back(x);
}

[Edit:] if looking at it with a view to optimisation, you'd probably not bother doing all the scaling factor calculation if you already know it's a runtime outlier, but hey, at least it's avoiding some unnecessary sanity checks for overflows with this.

Jason G · June 21, 2012, 02:55:18 AM

Walked through again,
yeah, you'll need 10 pre-existing, and 10 or more consecutive valid, (that went right through to APR),

So the scaling should start with the 11th consecutive valid that isn't a runtime outlier.

[Edit:] Hmmm, I'll have to look at that runtime outlier code too, when I can.... my modded client 'let's go' of its own breed of suspected runtime outlier thing after about 5 consecutive 90% overestimates... If the server code doesn't let go there could be some stuck hosts that never scale, especially on major hardware + app upgrades... will have to walk through that logic to make sure it's got some sort of escape hatch as well (which can change the logic as to when scaling kicks in, or inhibit it completely .... i.e. won't necessarily start scaling on the 11th consecutive valid).

Jason G · June 21, 2012, 04:02:19 AM

having just looked through the validator code, the only time a runtime outlier is marked, is when "result_overflow" appears in the stderr text ... so watch out for those valid overflow results with truncated or missing stderr ... they'll scale APR.

Josef W. Segur · June 22, 2012, 10:51:25 AM

Quote from: Jason G on June 21, 2012, 04:02:19 AM
having just looked through the validator code, the only time a runtime outlier is marked, is when "result_overflow" appears in the stderr text ... so watch out for those valid overflow results with truncated or missing stderr ... they'll scale APR.

When the focus shifts to S@H v7 we could suggest Eric implement a check on the result file rather than totally relying on stderr. It wouldn't be particularly difficult, every task which exits at completion puts a best_gaussian in the result (though it may just have initialization values), but a result_overflow does not report any best_xxxxx signals. The best_autocorr, best_pulse or best_spike are not guaranteed to be present for a normal exit, though if not it would be strong evidence of a corrupted WU, and best_triplet is only there when there's a reportable triplet.

Another idea would be to get David to think again about adding code to make sure the stderr is properly handled. Last January one of the Milkyway@home devs submitted a patch, but was unable to convince David it was needed.
Joe

Jason G · June 22, 2012, 06:39:37 PM

What I'll do at some point (pretty busy at this time with work) is ask for permission from Eric, then ask Claggy and Arkayn, or someone skilled like that, to help test some possible vulnerabilities in that mechanism. That might help prompt a better look down the road as V7 & GBT get closer to coming online.

Jason

William · August 14, 2012, 07:54:54 PM

It appears that the latest attempts the fix the scheduler on the VLAR to NV issue have resulted in 'dont_use_dcf' getting reactivated.

Jason G · August 14, 2012, 09:06:05 PM

Quote from: Barry on August 14, 2012, 07:54:54 PM
It appears that the latest attempts the fix the scheduler on the VLAR to NV issue have resulted in 'dont_use_dcf' getting reactivated.

Which will be good for long term crunchers indeed... but my intuition based assessment suggests that 90% of users attach, experience problems, then abandon the work... so when I can I will reassess the move of modified Boinc from aDCFs to adaptive flops as Joe suggested a while back, embedding into newer builds, and a form that could be suitable for stock Boinc... well see, needs another look after x41z is hammered flat, at least in third party form. Probably won't be waiting for Beta project anymore, and move on to quick XAK & XAP test builds on the weekends.

dont_use_dcf active at SETI main

Jason G

June 16, 2012, 02:01:09 AM #20 Last Edit: June 17, 2012, 09:22:12 PM by Barry

arkayn

June 16, 2012, 04:38:16 AM #21

William

June 21, 2012, 12:04:29 AM #22

Richard Haselgrove

June 21, 2012, 12:38:44 AM #23

William

June 21, 2012, 01:25:38 AM #24

Jason G

June 21, 2012, 02:17:20 AM #25

William

June 21, 2012, 02:34:15 AM #26

Jason G

June 21, 2012, 02:41:10 AM #27 Last Edit: June 21, 2012, 02:46:56 AM by Jason G

Jason G

June 21, 2012, 02:55:18 AM #28 Last Edit: June 21, 2012, 03:06:57 AM by Jason G

Jason G

June 21, 2012, 04:02:19 AM #29

Josef W. Segur

June 22, 2012, 10:51:25 AM #30

Jason G

June 22, 2012, 06:39:37 PM #31

William

August 14, 2012, 07:54:54 PM #32

Jason G

August 14, 2012, 09:06:05 PM #33