QuoteJosef W. Segur (http://lunatics.kwsn.net/index.php?action=profile;u=27) wrote:
I did sort of a survey of Beta hosts running stock S@H v7 with Bulldozer and Sandy Bridge CPUs to see which chirp variants were chosen most. Bulldozer were about 8% a, 34% b, 10% c, and 49% d. Sandy Bridge were about 13% a, 9% b, 9% c, and 69% d. That was for 277 results on Bulldozer and 296 on Sandy Bridge, so may be at least roughly meaningful.
I'll attach a J46 version of the test which has two added chirp variants which might possibly be even better than the d which was obviously the previous best. I left out the other kinds of functions this time, haven't figured out any significant improvements for those. But each run of the program now does the chirp tests 3 times.
What I'm aiming at, short term, is one best AVX chirp function which can be put into the existing Lunatics CPU code for a targeted AVX build. Hopefully we'll be able to use some dispatch functionality in future to keep the number of different builds down, but that's not ready yet.
Joe
Nice! cheers for relaying... I'll give it a go on the i5 when I can. I'll be sticking the results of that work in AKv8c for sure ;) See, teamwork is possible without having to strangle one another :)
Since I know you don't want to go back there and could not get to the test even if you did. I figure I should bring it over here for you.
Quote from: arkayn on May 06, 2012, 09:35:54 AM
Since I know you don't want to go back there and could not get to the test even if you did. I figure I should bring it over here for you.
Second part is only a Flick of the Switch (http://www.youtube.com/watch?v=9tqIOE-rPJU).
QuoteJosef W. Segur (http://lunatics.kwsn.net/index.php?action=profile;u=27) wrote:
Hmm, my hopes for a single best variant for both Bulldozer and Sandy Bridge are fading.
I'm still hoping to improve things further for Bulldozer, the attached J47 test has several subvariants of f with the prefetch distance varied from 2 to 6 cache lines (was 4 in J46). Possibly one will get the input data to L1 at just the right time, at least there may be some observable differences.
Joe
QuoteJosef W. Segur (http://lunatics.kwsn.net/index.php?action=profile;u=27) wrote:
Maybe a measure of how well the hardware prefetching is matched to the memory system. In the attached J48 I've added an fn with no software prefetching, perhaps your system will prefer that over f6.
I've also modified the way the test time is calculated slightly. Each test consists of ten runs and the average of all was used, now I'm dropping the slowest of the ten runs to reduce the effect of transient conditions. I expect it to still vary more with BOINC running than without, though.
Joe
Will get a look at this on the i5 later tonight hopefully. Nearly ready, after a rest, to switch to AKv8c rapidfire implementation, so it's great that's Joe's making leaps right now :D
Jason
QuoteJosef W. Segur (http://lunatics.kwsn.net/index.php?action=profile;u=27) wrote:
Well, it's clear that the software prefetching is doing some good though not clear why the more distant prefetch works slightly better on the 8 core Bulldozers even when BOINC is active. Getting those details pinned down can wait for final tuning though.
With AVX chirping times at ~80% of SSE3 chirping on Bulldozer but ~50% on Sandy Bridge, I'm looking for something with larger effects. One faint possibility is that the way the input and output test buffers are allocated in J48 and earlier might possibly cause L1 cache thrashing. I don't think that's likely, but am attaching J48a. The allocations are revised but functions being tested are unchanged.
Joe
good god. He'll have the thing finished before I can get to the i5 ;)
I'll try get a bulk run of the test pieces in soon :D
Quote
Josef W. Segur (http://lunatics.kwsn.net/index.php?action=profile;u=27) wrote:
Thanks, it's good to be sure the test allocations weren't causing the problem.
For J49 I've collapsed the f subvariants back to a single with the same prefetch as a through e (4 cache lines ahead). Even though that may not be the best, it makes comparison easier.
Added test g, which loads data in two 128 bit chunks rather than full 256 bit chunks. That's a technique some Intel documents recommend, though it's not expected to make a large difference.
Added test h, which does TLB priming to eliminate delays crossing page boundaries, and prefetches a whole page sized block at once, like the Astropulse TWINDECHIRP. I have hopes that might make a significant difference.
The sse3_ChirpData_ak8 variant didn't have prefetch, so was often slower than sse2_ChirpData_ak8. I've put the prefetch in.
Although I've reviewed the changes to the AVX routines several times, they're significant enough there's some risk of crashing if I missed something. I hope not.
Joe
Ah! the old tlb priming trick from Intel docs eh ? should be good. Hopefully the i5 won't fight me tonight...
Quote
Josef W. Segur (http://lunatics.kwsn.net/index.php?action=profile;u=27) wrote:
Trying to find the strengths of Bulldozer, I've added a chirp variant using both AVX and FMA4 in J50. It does reduce the number of instructions in the loop by 8 or more so should have some measurable effect, though it still has to load and save just as much data. Other than the FMA4 changes, it's like the g AVX version.
If I have everything right, it ought to show as unsupported on Sandy Bridge and run on Bulldozer. If not, anything might happen. (http://lunatics.kwsn.net/Smileys/default/tongue.gif)
Joe
FX-4100
BOINC running on GTX460
=========================================================
Ftst_v7_J50_Chirponly started.
Optimal function choices:
--------------------------------------------------------
name timing error
--------------------------------------------------------
v_ChirpData 0.009627 0.00000 test
fpu_ChirpData 0.019200 0.00000 test
fpu_opt_ChirpData 0.008951 0.00000 test
sse1_ChirpData_ak8e 0.007910 0.00000 test
sse2_ChirpData_ak8 0.005040 0.00000 test
sse3_ChirpData_ak8 0.004927 0.00000 test
avx_ChirpData_a 0.004119 0.00000 test
avx_ChirpData_b 0.004149 0.00000 test
avx_ChirpData_c 0.004650 0.00000 test
avx_ChirpData_d 0.004221 0.00000 test
avx_ChirpData_e 0.004187 0.00000 test
avx_ChirpData_f 0.004013 0.00000 test
avx_ChirpData_g 0.004171 0.00000 test
avx_ChirpData_h 0.005179 0.00000 test
avx_fma4_ChirpData_a 0.003669 0.00000 test
avx_fma4_ChirpData_a 0.003669 0.00000 choice
Second run
v_ChirpData 0.009635 0.00000 test
fpu_ChirpData 0.018249 0.00000 test
fpu_opt_ChirpData 0.009154 0.00000 test
sse1_ChirpData_ak8e 0.007586 0.00000 test
sse2_ChirpData_ak8 0.004708 0.00000 test
sse3_ChirpData_ak8 0.004546 0.00000 test
avx_ChirpData_a 0.004097 0.00000 test
avx_ChirpData_b 0.004024 0.00000 test
avx_ChirpData_c 0.004339 0.00000 test
avx_ChirpData_d 0.004329 0.00000 test
avx_ChirpData_e 0.004205 0.00000 test
avx_ChirpData_f 0.003973 0.00000 test
avx_ChirpData_g 0.003893 0.00000 test
avx_ChirpData_h 0.004708 0.00000 test
avx_fma4_ChirpData_a 0.003704 0.00000 test
avx_fma4_ChirpData_a 0.003704 0.00000 choice
Third run
v_ChirpData 0.009304 0.00000 test
fpu_ChirpData 0.019267 0.00000 test
fpu_opt_ChirpData 0.008838 0.00000 test
sse1_ChirpData_ak8e 0.007273 0.00000 test
sse2_ChirpData_ak8 0.004618 0.00000 test
sse3_ChirpData_ak8 0.004530 0.00000 test
avx_ChirpData_a 0.004216 0.00000 test
avx_ChirpData_b 0.004080 0.00000 test
avx_ChirpData_c 0.004223 0.00000 test
avx_ChirpData_d 0.004374 0.00000 test
avx_ChirpData_e 0.004065 0.00000 test
avx_ChirpData_f 0.003829 0.00000 test
avx_ChirpData_g 0.004143 0.00000 test
avx_ChirpData_h 0.004819 0.00000 test
avx_fma4_ChirpData_a 0.003452 0.00000 test
avx_fma4_ChirpData_a 0.003452 0.00000 choice
Test duration 9.33 seconds
Ftst_v7 completed successfully.
i3-2120
BOINC running on GTX560
=========================================================
Ftst_v7_J50_Chirponly started.
Optimal function choices:
--------------------------------------------------------
name timing error
--------------------------------------------------------
v_ChirpData 0.004834 0.00000 test
fpu_ChirpData 0.012480 0.00000 test
fpu_opt_ChirpData 0.004621 0.00000 test
sse1_ChirpData_ak8e 0.005919 0.00000 test
sse2_ChirpData_ak8 0.004306 0.00000 test
sse3_ChirpData_ak8 0.004118 0.00000 test
avx_ChirpData_a 0.002149 0.00000 test
avx_ChirpData_b 0.002130 0.00000 test
avx_ChirpData_c 0.002282 0.00000 test
avx_ChirpData_d 0.002143 0.00000 test
avx_ChirpData_e 0.002010 0.00000 test
avx_ChirpData_f 0.002116 0.00000 test
avx_ChirpData_g 0.002156 0.00000 test
avx_ChirpData_h 0.002744 0.00000 test
avx_fma4_ChirpData_a not supported by system
avx_ChirpData_e 0.002010 0.00000 choice
Second run
v_ChirpData 0.004538 0.00000 test
fpu_ChirpData 0.012324 0.00000 test
fpu_opt_ChirpData 0.004330 0.00000 test
sse1_ChirpData_ak8e 0.005758 0.00000 test
sse2_ChirpData_ak8 0.004179 0.00000 test
sse3_ChirpData_ak8 0.004003 0.00000 test
avx_ChirpData_a 0.002143 0.00000 test
avx_ChirpData_b 0.002585 0.00000 test
avx_ChirpData_c 0.002312 0.00000 test
avx_ChirpData_d 0.001930 0.00000 test
avx_ChirpData_e 0.002107 0.00000 test
avx_ChirpData_f 0.002309 0.00000 test
avx_ChirpData_g 0.002067 0.00000 test
avx_ChirpData_h 0.002657 0.00000 test
avx_fma4_ChirpData_a not supported by system
avx_ChirpData_d 0.001930 0.00000 choice
Third run
v_ChirpData 0.005770 0.00000 test
fpu_ChirpData 0.012279 0.00000 test
fpu_opt_ChirpData 0.004469 0.00000 test
sse1_ChirpData_ak8e 0.006027 0.00000 test
sse2_ChirpData_ak8 0.004288 0.00000 test
sse3_ChirpData_ak8 0.004026 0.00000 test
avx_ChirpData_a 0.002081 0.00000 test
avx_ChirpData_b 0.002052 0.00000 test
avx_ChirpData_c 0.002536 0.00000 test
avx_ChirpData_d 0.001946 0.00000 test
avx_ChirpData_e 0.001987 0.00000 test
avx_ChirpData_f 0.002057 0.00000 test
avx_ChirpData_g 0.002298 0.00000 test
avx_ChirpData_h 0.002790 0.00000 test
avx_fma4_ChirpData_a not supported by system
avx_ChirpData_d 0.001946 0.00000 choice
Test duration 7.68 seconds
Ftst_v7 completed successfully.
Mmmm, tasty :). The i3's still putting the FX to shame, but a nice definite improvement. With full dispatch going in, I can probably compile in a special fftw with FMA's enabled only for the AVX capable AMDs, so at least it gives them a fighting chance with FFTs as well.
Jason
QuoteJosef W. Segur (http://lunatics.kwsn.net/index.php?action=profile;u=27) wrote:
The FMA4 a variant produced about a 5% speedup by reducing the number of floating point instructions in the inner loop by ~11%. That's good, but confirms that getting the data transferred still needs improvement. For J51 I'm trying the TLB priming again, but without block prefetching. The i variant for AVX is modified from the h variant, and the changes were merged to the b variant for AVX+FMA4.
Joe
FX-4100
BOINC running on 460
=========================================================
Ftst_v7_J51_Chirponly started.
Optimal function choices:
--------------------------------------------------------
name timing error
--------------------------------------------------------
v_ChirpData 0.009009 0.00000 test
fpu_ChirpData 0.018200 0.00000 test
fpu_opt_ChirpData 0.008752 0.00000 test
sse1_ChirpData_ak8e 0.007503 0.00000 test
sse2_ChirpData_ak8 0.004782 0.00000 test
sse3_ChirpData_ak8 0.004801 0.00000 test
avx_ChirpData_a 0.003903 0.00000 test
avx_ChirpData_b 0.003902 0.00000 test
avx_ChirpData_c 0.004298 0.00000 test
avx_ChirpData_d 0.004136 0.00000 test
avx_ChirpData_e 0.003988 0.00000 test
avx_ChirpData_f 0.003865 0.00000 test
avx_ChirpData_g 0.003858 0.00000 test
avx_ChirpData_h 0.004558 0.00000 test
avx_ChirpData_i 0.004006 0.00000 test
avx_fma4_ChirpData_a 0.003524 0.00000 test
avx_fma4_ChirpData_b 0.060127 0.50095 test
avx_fma4_ChirpData_a 0.003524 0.00000 choice
Second run
v_ChirpData 0.009023 0.00000 test
fpu_ChirpData 0.018034 0.00000 test
fpu_opt_ChirpData 0.008862 0.00000 test
sse1_ChirpData_ak8e 0.007292 0.00000 test
sse2_ChirpData_ak8 0.004615 0.00000 test
sse3_ChirpData_ak8 0.004532 0.00000 test
avx_ChirpData_a 0.003917 0.00000 test
avx_ChirpData_b 0.003865 0.00000 test
avx_ChirpData_c 0.004167 0.00000 test
avx_ChirpData_d 0.004040 0.00000 test
avx_ChirpData_e 0.004026 0.00000 test
avx_ChirpData_f 0.003821 0.00000 test
avx_ChirpData_g 0.003666 0.00000 test
avx_ChirpData_h 0.004601 0.00000 test
avx_ChirpData_i 0.003980 0.00000 test
avx_fma4_ChirpData_a 0.003389 0.00000 test
avx_fma4_ChirpData_b 0.058483 0.50095 test
avx_fma4_ChirpData_a 0.003389 0.00000 choice
Third run
v_ChirpData 0.008824 0.00000 test
fpu_ChirpData 0.017494 0.00000 test
fpu_opt_ChirpData 0.008599 0.00000 test
sse1_ChirpData_ak8e 0.007149 0.00000 test
sse2_ChirpData_ak8 0.004593 0.00000 test
sse3_ChirpData_ak8 0.004453 0.00000 test
avx_ChirpData_a 0.003842 0.00000 test
avx_ChirpData_b 0.003825 0.00000 test
avx_ChirpData_c 0.004122 0.00000 test
avx_ChirpData_d 0.004023 0.00000 test
avx_ChirpData_e 0.003950 0.00000 test
avx_ChirpData_f 0.003855 0.00000 test
avx_ChirpData_g 0.003928 0.00000 test
avx_ChirpData_h 0.004565 0.00000 test
avx_ChirpData_i 0.004058 0.00000 test
avx_fma4_ChirpData_a 0.003531 0.00000 test
avx_fma4_ChirpData_b 0.059600 0.50095 test
avx_fma4_ChirpData_a 0.003531 0.00000 choice
Test duration 11.53 seconds
Ftst_v7 completed successfully.
i3-2120
BOINC running on 560
=========================================================
Ftst_v7_J51_Chirponly started.
Optimal function choices:
--------------------------------------------------------
name timing error
--------------------------------------------------------
v_ChirpData 0.004599 0.00000 test
fpu_ChirpData 0.012435 0.00000 test
fpu_opt_ChirpData 0.004366 0.00000 test
sse1_ChirpData_ak8e 0.006014 0.00000 test
sse2_ChirpData_ak8 0.004207 0.00000 test
sse3_ChirpData_ak8 0.004177 0.00000 test
avx_ChirpData_a 0.002153 0.00000 test
avx_ChirpData_b 0.002141 0.00000 test
avx_ChirpData_c 0.002217 0.00000 test
avx_ChirpData_d 0.002032 0.00000 test
avx_ChirpData_e 0.002002 0.00000 test
avx_ChirpData_f 0.002125 0.00000 test
avx_ChirpData_g 0.002081 0.00000 test
avx_ChirpData_h 0.002745 0.00000 test
avx_ChirpData_i 0.002329 0.00000 test
avx_fma4_ChirpData_a not supported by system
avx_fma4_ChirpData_b not supported by system
avx_ChirpData_e 0.002002 0.00000 choice
Second run
v_ChirpData 0.004888 0.00000 test
fpu_ChirpData 0.012563 0.00000 test
fpu_opt_ChirpData 0.004551 0.00000 test
sse1_ChirpData_ak8e 0.005902 0.00000 test
sse2_ChirpData_ak8 0.004339 0.00000 test
sse3_ChirpData_ak8 0.004017 0.00000 test
avx_ChirpData_a 0.002142 0.00000 test
avx_ChirpData_b 0.002153 0.00000 test
avx_ChirpData_c 0.002186 0.00000 test
avx_ChirpData_d 0.002007 0.00000 test
avx_ChirpData_e 0.001946 0.00000 test
avx_ChirpData_f 0.002063 0.00000 test
avx_ChirpData_g 0.002174 0.00000 test
avx_ChirpData_h 0.002790 0.00000 test
avx_ChirpData_i 0.002347 0.00000 test
avx_fma4_ChirpData_a not supported by system
avx_fma4_ChirpData_b not supported by system
avx_ChirpData_e 0.001946 0.00000 choice
Third run
v_ChirpData 0.004868 0.00000 test
fpu_ChirpData 0.012536 0.00000 test
fpu_opt_ChirpData 0.004565 0.00000 test
sse1_ChirpData_ak8e 0.005728 0.00000 test
sse2_ChirpData_ak8 0.004225 0.00000 test
sse3_ChirpData_ak8 0.004123 0.00000 test
avx_ChirpData_a 0.002121 0.00000 test
avx_ChirpData_b 0.002155 0.00000 test
avx_ChirpData_c 0.002184 0.00000 test
avx_ChirpData_d 0.002048 0.00000 test
avx_ChirpData_e 0.002039 0.00000 test
avx_ChirpData_f 0.002137 0.00000 test
avx_ChirpData_g 0.002188 0.00000 test
avx_ChirpData_h 0.002760 0.00000 test
avx_ChirpData_i 0.002335 0.00000 test
avx_fma4_ChirpData_a not supported by system
avx_fma4_ChirpData_b not supported by system
avx_ChirpData_e 0.002039 0.00000 choice
Test duration 8.08 seconds
Ftst_v7 completed successfully.
Have you seen that FFTW 3.3.2 has been released?
Claggy
QuoteJosef W. Segur (http://lunatics.kwsn.net/index.php?action=profile;u=27) wrote:
Another new Chirponly test, J52, is attached.
Fixed (I hope) the problem which made the avx_fma4 b variant so slow and inaccurate, added a c variant with a different approach to TLB priming.
Joe
FX-4100
BOINC Running on 460
=========================================================
Ftst_v7_J52_Chirponly started.
Optimal function choices:
--------------------------------------------------------
name timing error
--------------------------------------------------------
v_ChirpData 0.008840 0.00000 test
fpu_ChirpData 0.018138 0.00000 test
fpu_opt_ChirpData 0.009096 0.00000 test
sse1_ChirpData_ak8e 0.007134 0.00000 test
sse2_ChirpData_ak8 0.004616 0.00000 test
sse3_ChirpData_ak8 0.004581 0.00000 test
avx_ChirpData_a 0.004071 0.00000 test
avx_ChirpData_b 0.003964 0.00000 test
avx_ChirpData_c 0.004295 0.00000 test
avx_ChirpData_d 0.004130 0.00000 test
avx_ChirpData_e 0.003982 0.00000 test
avx_ChirpData_f 0.003781 0.00000 test
avx_ChirpData_g 0.003714 0.00000 test
avx_ChirpData_h 0.004528 0.00000 test
avx_ChirpData_i 0.003994 0.00000 test
avx_fma4_ChirpData_a 0.003473 0.00000 test
avx_fma4_ChirpData_b 0.003617 0.00000 test
avx_fma4_ChirpData_c 0.003739 0.00000 test
avx_fma4_ChirpData_a 0.003473 0.00000 choice
Second run
v_ChirpData 0.009005 0.00000 test
fpu_ChirpData 0.017681 0.00000 test
fpu_opt_ChirpData 0.008559 0.00000 test
sse1_ChirpData_ak8e 0.007305 0.00000 test
sse2_ChirpData_ak8 0.004635 0.00000 test
sse3_ChirpData_ak8 0.004459 0.00000 test
avx_ChirpData_a 0.003960 0.00000 test
avx_ChirpData_b 0.003880 0.00000 test
avx_ChirpData_c 0.004260 0.00000 test
avx_ChirpData_d 0.004184 0.00000 test
avx_ChirpData_e 0.004021 0.00000 test
avx_ChirpData_f 0.003816 0.00000 test
avx_ChirpData_g 0.003791 0.00000 test
avx_ChirpData_h 0.004508 0.00000 test
avx_ChirpData_i 0.003953 0.00000 test
avx_fma4_ChirpData_a 0.003404 0.00000 test
avx_fma4_ChirpData_b 0.003597 0.00000 test
avx_fma4_ChirpData_c 0.003738 0.00000 test
avx_fma4_ChirpData_a 0.003404 0.00000 choice
Third run
v_ChirpData 0.008951 0.00000 test
fpu_ChirpData 0.017233 0.00000 test
fpu_opt_ChirpData 0.008535 0.00000 test
sse1_ChirpData_ak8e 0.007110 0.00000 test
sse2_ChirpData_ak8 0.004573 0.00000 test
sse3_ChirpData_ak8 0.004376 0.00000 test
avx_ChirpData_a 0.003833 0.00000 test
avx_ChirpData_b 0.003780 0.00000 test
avx_ChirpData_c 0.004112 0.00000 test
avx_ChirpData_d 0.004140 0.00000 test
avx_ChirpData_e 0.003956 0.00000 test
avx_ChirpData_f 0.003741 0.00000 test
avx_ChirpData_g 0.003686 0.00000 test
avx_ChirpData_h 0.004516 0.00000 test
avx_ChirpData_i 0.003902 0.00000 test
avx_fma4_ChirpData_a 0.003376 0.00000 test
avx_fma4_ChirpData_b 0.003804 0.00000 test
avx_fma4_ChirpData_c 0.003685 0.00000 test
avx_fma4_ChirpData_a 0.003376 0.00000 choice
Test duration 10.54 seconds
Ftst_v7 completed successfully.
i3-2120
BOINC Running on 560
=========================================================
Ftst_v7_J52_Chirponly started.
Optimal function choices:
--------------------------------------------------------
name timing error
--------------------------------------------------------
v_ChirpData 0.005332 0.00000 test
fpu_ChirpData 0.012461 0.00000 test
fpu_opt_ChirpData 0.004724 0.00000 test
sse1_ChirpData_ak8e 0.005928 0.00000 test
sse2_ChirpData_ak8 0.004362 0.00000 test
sse3_ChirpData_ak8 0.004210 0.00000 test
avx_ChirpData_a 0.002198 0.00000 test
avx_ChirpData_b 0.002080 0.00000 test
avx_ChirpData_c 0.002259 0.00000 test
avx_ChirpData_d 0.002050 0.00000 test
avx_ChirpData_e 0.002061 0.00000 test
avx_ChirpData_f 0.002186 0.00000 test
avx_ChirpData_g 0.002199 0.00000 test
avx_ChirpData_h 0.002787 0.00000 test
avx_ChirpData_i 0.002355 0.00000 test
avx_fma4_ChirpData_a not supported by system
avx_fma4_ChirpData_b not supported by system
avx_fma4_ChirpData_c not supported by system
avx_ChirpData_d 0.002050 0.00000 choice
Second run
v_ChirpData 0.004999 0.00000 test
fpu_ChirpData 0.012899 0.00000 test
fpu_opt_ChirpData 0.004722 0.00000 test
sse1_ChirpData_ak8e 0.005912 0.00000 test
sse2_ChirpData_ak8 0.004414 0.00000 test
sse3_ChirpData_ak8 0.004065 0.00000 test
avx_ChirpData_a 0.002204 0.00000 test
avx_ChirpData_b 0.002195 0.00000 test
avx_ChirpData_c 0.002226 0.00000 test
avx_ChirpData_d 0.002059 0.00000 test
avx_ChirpData_e 0.002055 0.00000 test
avx_ChirpData_f 0.002176 0.00000 test
avx_ChirpData_g 0.002093 0.00000 test
avx_ChirpData_h 0.002694 0.00000 test
avx_ChirpData_i 0.002245 0.00000 test
avx_fma4_ChirpData_a not supported by system
avx_fma4_ChirpData_b not supported by system
avx_fma4_ChirpData_c not supported by system
avx_ChirpData_e 0.002055 0.00000 choice
Third run
v_ChirpData 0.004695 0.00000 test
fpu_ChirpData 0.012390 0.00000 test
fpu_opt_ChirpData 0.004516 0.00000 test
sse1_ChirpData_ak8e 0.005742 0.00000 test
sse2_ChirpData_ak8 0.004219 0.00000 test
sse3_ChirpData_ak8 0.004038 0.00000 test
avx_ChirpData_a 0.002096 0.00000 test
avx_ChirpData_b 0.002074 0.00000 test
avx_ChirpData_c 0.002121 0.00000 test
avx_ChirpData_d 0.001955 0.00000 test
avx_ChirpData_e 0.001953 0.00000 test
avx_ChirpData_f 0.002074 0.00000 test
avx_ChirpData_g 0.002091 0.00000 test
avx_ChirpData_h 0.002691 0.00000 test
avx_ChirpData_i 0.002248 0.00000 test
avx_fma4_ChirpData_a not supported by system
avx_fma4_ChirpData_b not supported by system
avx_fma4_ChirpData_c not supported by system
avx_ChirpData_e 0.001953 0.00000 choice
Test duration 8.12 seconds
Ftst_v7 completed successfully.
Quote from: Claggy on May 17, 2012, 10:36:04 PM
Have you seen that FFTW 3.3.2 has been released?
Claggy
Won't be a big player from our viewpoint. Some others & myself had forwarded some improvements to make AVX work with other compilers to Matteo Frigo (Author of FFTW). For most of our purposes IPP will perform better ( i.e. as with Heinz' ATOM experiments, in a more formalised & general way) , though the option of including polished fftw will tend to take some of the strain out of older AMD & Pre-SSE2 chips, where Intel's Integrated performance primitives are lacking. Small market, but important for completion, which cuts down user support, which is the biggest cost.
Jason
QuoteJosef W. Segur (http://lunatics.kwsn.net/index.php?action=profile;u=27) wrote:
Here's a J53 Chirponly version. I've dropped testing the avx_fma4 b and c variants since TLB priming seems ineffective so far.
The added d4, d6, and d8 avx_fma4 tests are like the a variant but with some further conversions from avx to fma4 reducing the instruction count in the loop from 67 to 65. I think that's all that can be converted. The three subvariants have prefetch ahead 4, 6, and 8 cache lines.
Joe
FX-4100
BOINC running on 460
=========================================================
Ftst_v7_J53_Chirponly started.
Optimal function choices:
--------------------------------------------------------
name timing error
--------------------------------------------------------
v_ChirpData 0.009027 0.00000 test
fpu_ChirpData 0.017784 0.00000 test
fpu_opt_ChirpData 0.008842 0.00000 test
sse1_ChirpData_ak8e 0.007343 0.00000 test
sse2_ChirpData_ak8 0.004565 0.00000 test
sse3_ChirpData_ak8 0.004510 0.00000 test
avx_ChirpData_a 0.003855 0.00000 test
avx_ChirpData_b 0.003915 0.00000 test
avx_ChirpData_c 0.004248 0.00000 test
avx_ChirpData_d 0.004088 0.00000 test
avx_ChirpData_e 0.003987 0.00000 test
avx_ChirpData_f 0.003833 0.00000 test
avx_ChirpData_g 0.003815 0.00000 test
avx_ChirpData_h 0.004630 0.00000 test
avx_ChirpData_i 0.004029 0.00000 test
avx_fma4_ChirpData_a 0.003409 0.00000 test
avx_fma4_ChirpData_d4 0.003450 0.00000 test
avx_fma4_ChirpData_d6 0.003494 0.00000 test
avx_fma4_ChirpData_d8 0.003531 0.00000 test
avx_fma4_ChirpData_a 0.003409 0.00000 choice
Second run
v_ChirpData 0.009120 0.00000 test
fpu_ChirpData 0.017763 0.00000 test
fpu_opt_ChirpData 0.008955 0.00000 test
sse1_ChirpData_ak8e 0.007134 0.00000 test
sse2_ChirpData_ak8 0.004597 0.00000 test
sse3_ChirpData_ak8 0.004496 0.00000 test
avx_ChirpData_a 0.003862 0.00000 test
avx_ChirpData_b 0.003970 0.00000 test
avx_ChirpData_c 0.004118 0.00000 test
avx_ChirpData_d 0.003992 0.00000 test
avx_ChirpData_e 0.003903 0.00000 test
avx_ChirpData_f 0.003729 0.00000 test
avx_ChirpData_g 0.003698 0.00000 test
avx_ChirpData_h 0.004578 0.00000 test
avx_ChirpData_i 0.004050 0.00000 test
avx_fma4_ChirpData_a 0.003464 0.00000 test
avx_fma4_ChirpData_d4 0.003434 0.00000 test
avx_fma4_ChirpData_d6 0.003415 0.00000 test
avx_fma4_ChirpData_d8 0.003430 0.00000 test
avx_fma4_ChirpData_d6 0.003415 0.00000 choice
Third run
v_ChirpData 0.009029 0.00000 test
fpu_ChirpData 0.017711 0.00000 test
fpu_opt_ChirpData 0.008843 0.00000 test
sse1_ChirpData_ak8e 0.007283 0.00000 test
sse2_ChirpData_ak8 0.004629 0.00000 test
sse3_ChirpData_ak8 0.004534 0.00000 test
avx_ChirpData_a 0.003926 0.00000 test
avx_ChirpData_b 0.003819 0.00000 test
avx_ChirpData_c 0.004124 0.00000 test
avx_ChirpData_d 0.003992 0.00000 test
avx_ChirpData_e 0.003893 0.00000 test
avx_ChirpData_f 0.003694 0.00000 test
avx_ChirpData_g 0.003721 0.00000 test
avx_ChirpData_h 0.004432 0.00000 test
avx_ChirpData_i 0.003909 0.00000 test
avx_fma4_ChirpData_a 0.003482 0.00000 test
avx_fma4_ChirpData_d4 0.003391 0.00000 test
avx_fma4_ChirpData_d6 0.003398 0.00000 test
avx_fma4_ChirpData_d8 0.003476 0.00000 test
avx_fma4_ChirpData_d4 0.003391 0.00000 choice
Test duration 10.75 seconds
Ftst_v7 completed successfully.
i3-2120
BOINC running on 560
=========================================================
Ftst_v7_J53_Chirponly started.
Optimal function choices:
--------------------------------------------------------
name timing error
--------------------------------------------------------
v_ChirpData 0.004919 0.00000 test
fpu_ChirpData 0.012437 0.00000 test
fpu_opt_ChirpData 0.004521 0.00000 test
sse1_ChirpData_ak8e 0.005737 0.00000 test
sse2_ChirpData_ak8 0.004192 0.00000 test
sse3_ChirpData_ak8 0.004024 0.00000 test
avx_ChirpData_a 0.002183 0.00000 test
avx_ChirpData_b 0.002301 0.00000 test
avx_ChirpData_c 0.002154 0.00000 test
avx_ChirpData_d 0.002204 0.00000 test
avx_ChirpData_e 0.002826 0.00000 test
avx_ChirpData_f 0.002604 0.00000 test
avx_ChirpData_g 0.002915 0.00000 test
avx_ChirpData_h 0.002857 0.00000 test
avx_ChirpData_i 0.002517 0.00000 test
avx_fma4_ChirpData_a not supported by system
avx_fma4_ChirpData_d4 not supported by system
avx_fma4_ChirpData_d6 not supported by system
avx_fma4_ChirpData_d8 not supported by system
avx_ChirpData_c 0.002154 0.00000 choice
Second run
v_ChirpData 0.004754 0.00000 test
fpu_ChirpData 0.013919 0.00000 test
fpu_opt_ChirpData 0.005436 0.00000 test
sse1_ChirpData_ak8e 0.005733 0.00000 test
sse2_ChirpData_ak8 0.004207 0.00000 test
sse3_ChirpData_ak8 0.004042 0.00000 test
avx_ChirpData_a 0.002121 0.00000 test
avx_ChirpData_b 0.002084 0.00000 test
avx_ChirpData_c 0.002124 0.00000 test
avx_ChirpData_d 0.001957 0.00000 test
avx_ChirpData_e 0.001959 0.00000 test
avx_ChirpData_f 0.002082 0.00000 test
avx_ChirpData_g 0.002103 0.00000 test
avx_ChirpData_h 0.002690 0.00000 test
avx_ChirpData_i 0.002242 0.00000 test
avx_fma4_ChirpData_a not supported by system
avx_fma4_ChirpData_d4 not supported by system
avx_fma4_ChirpData_d6 not supported by system
avx_fma4_ChirpData_d8 not supported by system
avx_ChirpData_d 0.001957 0.00000 choice
Third run
v_ChirpData 0.004720 0.00000 test
fpu_ChirpData 0.012409 0.00000 test
fpu_opt_ChirpData 0.004466 0.00000 test
sse1_ChirpData_ak8e 0.005780 0.00000 test
sse2_ChirpData_ak8 0.004227 0.00000 test
sse3_ChirpData_ak8 0.004003 0.00000 test
avx_ChirpData_a 0.002105 0.00000 test
avx_ChirpData_b 0.002092 0.00000 test
avx_ChirpData_c 0.002123 0.00000 test
avx_ChirpData_d 0.001955 0.00000 test
avx_ChirpData_e 0.001967 0.00000 test
avx_ChirpData_f 0.002079 0.00000 test
avx_ChirpData_g 0.002094 0.00000 test
avx_ChirpData_h 0.002690 0.00000 test
avx_ChirpData_i 0.002268 0.00000 test
avx_fma4_ChirpData_a not supported by system
avx_fma4_ChirpData_d4 not supported by system
avx_fma4_ChirpData_d6 not supported by system
avx_fma4_ChirpData_d8 not supported by system
avx_ChirpData_d 0.001955 0.00000 choice
Test duration 8.17 seconds
QuoteJosef W. Segur (http://lunatics.kwsn.net/index.php?action=profile;u=27) wrote:
That difference in how the FX-4100 and FX-8150 react to prefetch distance is still fascinating.
For J54 I've modified the test framework again, each test will show the minimum time taken by one iteration. That will give some indication of how much variance there is.
I've also added an e variant for avx_fma4 which is exploring the capability doing 128 bit operations rather than 256 like all the other avx tests. I expect it to be faster than the existing SSE3 test both because it's using fma4 and because with avx enabled there are 3 operand forms of the instructions. With old-style SSE3 an operation like a = b + c actually had to copy b to a then add c. The 3 operand form does it in a single operation. I doubt the e variant will challenge the 256 bit versions, but it's possible. An AMD engineer chose to have the GCC autovectorizer produce 128 bit AVX and FMA4 for Bulldozer v1 because that outperformed 256 bit code on some of the SPEC benchmarks.
Joe
FX4100 @3.6
First run BOINC running on GTX460
=========================================================
Ftst_v7_J54_Chirponly started.
Optimal function choices:
--------------------------------------------------------
name timing error
--------------------------------------------------------
v_ChirpData 0.009313 0.00000 test mintime= 0.004975
fpu_ChirpData 0.017663 0.00000 test mintime= 0.017547
fpu_opt_ChirpData 0.009177 0.00000 test mintime= 0.004658
sse1_ChirpData_ak8e 0.007268 0.00000 test mintime= 0.007189
sse2_ChirpData_ak8 0.004597 0.00000 test mintime= 0.004549
sse3_ChirpData_ak8 0.004520 0.00000 test mintime= 0.004419
avx_ChirpData_a 0.003807 0.00000 test mintime= 0.003769
avx_ChirpData_b 0.003873 0.00000 test mintime= 0.003792
avx_ChirpData_c 0.004119 0.00000 test mintime= 0.004081
avx_ChirpData_d 0.004026 0.00000 test mintime= 0.003959
avx_ChirpData_e 0.003916 0.00000 test mintime= 0.003878
avx_ChirpData_f 0.003722 0.00000 test mintime= 0.003698
avx_ChirpData_g 0.003716 0.00000 test mintime= 0.003637
avx_ChirpData_h 0.004431 0.00000 test mintime= 0.004382
avx_ChirpData_i 0.003890 0.00000 test mintime= 0.003846
avx_fma4_ChirpData_a 0.003380 0.00000 test mintime= 0.003322
avx_fma4_ChirpData_d4 0.003431 0.00000 test mintime= 0.003379
avx_fma4_ChirpData_d6 0.003521 0.00000 test mintime= 0.003345
avx_fma4_ChirpData_d8 0.003383 0.00000 test mintime= 0.003338
avx_fma4_ChirpData_e 0.003917 0.00000 test mintime= 0.003905
avx_fma4_ChirpData_a 0.003380 0.00000 choice
Second run
v_ChirpData 0.009529 0.00000 test mintime= 0.004951
fpu_ChirpData 0.017635 0.00000 test mintime= 0.017457
fpu_opt_ChirpData 0.009079 0.00000 test mintime= 0.004666
sse1_ChirpData_ak8e 0.007233 0.00000 test mintime= 0.007192
sse2_ChirpData_ak8 0.004588 0.00000 test mintime= 0.004541
sse3_ChirpData_ak8 0.004432 0.00000 test mintime= 0.004417
avx_ChirpData_a 0.003823 0.00000 test mintime= 0.003739
avx_ChirpData_b 0.003827 0.00000 test mintime= 0.003784
avx_ChirpData_c 0.004122 0.00000 test mintime= 0.004076
avx_ChirpData_d 0.004002 0.00000 test mintime= 0.003958
avx_ChirpData_e 0.003933 0.00000 test mintime= 0.003886
avx_ChirpData_f 0.003716 0.00000 test mintime= 0.003666
avx_ChirpData_g 0.003687 0.00000 test mintime= 0.003615
avx_ChirpData_h 0.004483 0.00000 test mintime= 0.004378
avx_ChirpData_i 0.003910 0.00000 test mintime= 0.003850
avx_fma4_ChirpData_a 0.003392 0.00000 test mintime= 0.003324
avx_fma4_ChirpData_d4 0.003453 0.00000 test mintime= 0.003392
avx_fma4_ChirpData_d6 0.003533 0.00000 test mintime= 0.003487
avx_fma4_ChirpData_d8 0.003477 0.00000 test mintime= 0.003394
avx_fma4_ChirpData_e 0.003999 0.00000 test mintime= 0.003937
avx_fma4_ChirpData_a 0.003392 0.00000 choice
Third run
v_ChirpData 0.009590 0.00000 test mintime= 0.005087
fpu_ChirpData 0.018358 0.00000 test mintime= 0.017907
fpu_opt_ChirpData 0.009407 0.00000 test mintime= 0.004685
sse1_ChirpData_ak8e 0.007488 0.00000 test mintime= 0.007304
sse2_ChirpData_ak8 0.004673 0.00000 test mintime= 0.004614
sse3_ChirpData_ak8 0.004549 0.00000 test mintime= 0.004473
avx_ChirpData_a 0.004010 0.00000 test mintime= 0.003766
avx_ChirpData_b 0.003849 0.00000 test mintime= 0.003803
avx_ChirpData_c 0.004126 0.00000 test mintime= 0.004085
avx_ChirpData_d 0.004000 0.00000 test mintime= 0.003981
avx_ChirpData_e 0.003917 0.00000 test mintime= 0.003881
avx_ChirpData_f 0.003818 0.00000 test mintime= 0.003664
avx_ChirpData_g 0.003710 0.00000 test mintime= 0.003597
avx_ChirpData_h 0.004417 0.00000 test mintime= 0.004379
avx_ChirpData_i 0.003895 0.00000 test mintime= 0.003867
avx_fma4_ChirpData_a 0.003405 0.00000 test mintime= 0.003341
avx_fma4_ChirpData_d4 0.003448 0.00000 test mintime= 0.003356
avx_fma4_ChirpData_d6 0.003464 0.00000 test mintime= 0.003389
avx_fma4_ChirpData_d8 0.003538 0.00000 test mintime= 0.003346
avx_fma4_ChirpData_e 0.003965 0.00000 test mintime= 0.003922
avx_fma4_ChirpData_a 0.003405 0.00000 choice
Test duration 11.20 seconds
Ftst_v7 completed successfully.
Second run BOINC IDLE
=========================================================
Ftst_v7_J54_Chirponly started.
Optimal function choices:
--------------------------------------------------------
name timing error
--------------------------------------------------------
v_ChirpData 0.009191 0.00000 test mintime= 0.004928
fpu_ChirpData 0.017562 0.00000 test mintime= 0.017515
fpu_opt_ChirpData 0.008981 0.00000 test mintime= 0.004633
sse1_ChirpData_ak8e 0.007240 0.00000 test mintime= 0.007195
sse2_ChirpData_ak8 0.004578 0.00000 test mintime= 0.004515
sse3_ChirpData_ak8 0.004546 0.00000 test mintime= 0.004421
avx_ChirpData_a 0.003788 0.00000 test mintime= 0.003753
avx_ChirpData_b 0.003842 0.00000 test mintime= 0.003806
avx_ChirpData_c 0.004095 0.00000 test mintime= 0.004072
avx_ChirpData_d 0.003996 0.00000 test mintime= 0.003948
avx_ChirpData_e 0.003908 0.00000 test mintime= 0.003887
avx_ChirpData_f 0.003708 0.00000 test mintime= 0.003665
avx_ChirpData_g 0.003602 0.00000 test mintime= 0.003581
avx_ChirpData_h 0.004397 0.00000 test mintime= 0.004363
avx_ChirpData_i 0.003876 0.00000 test mintime= 0.003844
avx_fma4_ChirpData_a 0.003374 0.00000 test mintime= 0.003328
avx_fma4_ChirpData_d4 0.003371 0.00000 test mintime= 0.003353
avx_fma4_ChirpData_d6 0.003421 0.00000 test mintime= 0.003335
avx_fma4_ChirpData_d8 0.003377 0.00000 test mintime= 0.003348
avx_fma4_ChirpData_e 0.003945 0.00000 test mintime= 0.003914
avx_fma4_ChirpData_d4 0.003371 0.00000 choice
Second run
v_ChirpData 0.009147 0.00000 test mintime= 0.004946
fpu_ChirpData 0.017576 0.00000 test mintime= 0.017502
fpu_opt_ChirpData 0.008935 0.00000 test mintime= 0.004644
sse1_ChirpData_ak8e 0.007233 0.00000 test mintime= 0.007189
sse2_ChirpData_ak8 0.004593 0.00000 test mintime= 0.004523
sse3_ChirpData_ak8 0.004424 0.00000 test mintime= 0.004418
avx_ChirpData_a 0.003805 0.00000 test mintime= 0.003735
avx_ChirpData_b 0.003810 0.00000 test mintime= 0.003774
avx_ChirpData_c 0.004115 0.00000 test mintime= 0.004094
avx_ChirpData_d 0.003971 0.00000 test mintime= 0.003960
avx_ChirpData_e 0.003910 0.00000 test mintime= 0.003864
avx_ChirpData_f 0.003696 0.00000 test mintime= 0.003666
avx_ChirpData_g 0.003619 0.00000 test mintime= 0.003559
avx_ChirpData_h 0.004404 0.00000 test mintime= 0.004376
avx_ChirpData_i 0.003880 0.00000 test mintime= 0.003861
avx_fma4_ChirpData_a 0.003350 0.00000 test mintime= 0.003323
avx_fma4_ChirpData_d4 0.003392 0.00000 test mintime= 0.003354
avx_fma4_ChirpData_d6 0.003353 0.00000 test mintime= 0.003344
avx_fma4_ChirpData_d8 0.003352 0.00000 test mintime= 0.003340
avx_fma4_ChirpData_e 0.003941 0.00000 test mintime= 0.003902
avx_fma4_ChirpData_a 0.003350 0.00000 choice
Third run
v_ChirpData 0.009191 0.00000 test mintime= 0.004914
fpu_ChirpData 0.017564 0.00000 test mintime= 0.017467
fpu_opt_ChirpData 0.008974 0.00000 test mintime= 0.004635
sse1_ChirpData_ak8e 0.007437 0.00000 test mintime= 0.007225
sse2_ChirpData_ak8 0.004660 0.00000 test mintime= 0.004520
sse3_ChirpData_ak8 0.004443 0.00000 test mintime= 0.004420
avx_ChirpData_a 0.003801 0.00000 test mintime= 0.003711
avx_ChirpData_b 0.003829 0.00000 test mintime= 0.003784
avx_ChirpData_c 0.004095 0.00000 test mintime= 0.004075
avx_ChirpData_d 0.004004 0.00000 test mintime= 0.003969
avx_ChirpData_e 0.003909 0.00000 test mintime= 0.003861
avx_ChirpData_f 0.003724 0.00000 test mintime= 0.003667
avx_ChirpData_g 0.003675 0.00000 test mintime= 0.003593
avx_ChirpData_h 0.004403 0.00000 test mintime= 0.004370
avx_ChirpData_i 0.003866 0.00000 test mintime= 0.003849
avx_fma4_ChirpData_a 0.003363 0.00000 test mintime= 0.003351
avx_fma4_ChirpData_d4 0.003387 0.00000 test mintime= 0.003363
avx_fma4_ChirpData_d6 0.003381 0.00000 test mintime= 0.003345
avx_fma4_ChirpData_d8 0.003369 0.00000 test mintime= 0.003340
avx_fma4_ChirpData_e 0.003969 0.00000 test mintime= 0.003923
avx_fma4_ChirpData_a 0.003363 0.00000 choice
Test duration 11.07 seconds
Ftst_v7 completed successfully.
New test with a couple of the CPU "enhancing" features turned off.
FX-4100@3.6
BOINC IDLE
=========================================================
Ftst_v7_J54_Chirponly started.
Optimal function choices:
--------------------------------------------------------
name timing error
--------------------------------------------------------
v_ChirpData 0.009824 0.00000 test mintime= 0.005371
fpu_ChirpData 0.017487 0.00000 test mintime= 0.017333
fpu_opt_ChirpData 0.009654 0.00000 test mintime= 0.005122
sse1_ChirpData_ak8e 0.007148 0.00000 test mintime= 0.007111
sse2_ChirpData_ak8 0.004559 0.00000 test mintime= 0.004487
sse3_ChirpData_ak8 0.004439 0.00000 test mintime= 0.004402
avx_ChirpData_a 0.003772 0.00000 test mintime= 0.003741
avx_ChirpData_b 0.003828 0.00000 test mintime= 0.003732
avx_ChirpData_c 0.004082 0.00000 test mintime= 0.004050
avx_ChirpData_d 0.003935 0.00000 test mintime= 0.003932
avx_ChirpData_e 0.003859 0.00000 test mintime= 0.003853
avx_ChirpData_f 0.003646 0.00000 test mintime= 0.003635
avx_ChirpData_g 0.003545 0.00000 test mintime= 0.003533
avx_ChirpData_h 0.004371 0.00000 test mintime= 0.004342
avx_ChirpData_i 0.003838 0.00000 test mintime= 0.003808
avx_fma4_ChirpData_a 0.003330 0.00000 test mintime= 0.003310
avx_fma4_ChirpData_d4 0.003355 0.00000 test mintime= 0.003341
avx_fma4_ChirpData_d6 0.003351 0.00000 test mintime= 0.003328
avx_fma4_ChirpData_d8 0.003342 0.00000 test mintime= 0.003325
avx_fma4_ChirpData_e 0.003921 0.00000 test mintime= 0.003904
avx_fma4_ChirpData_a 0.003330 0.00000 choice
Second run
v_ChirpData 0.009809 0.00000 test mintime= 0.005367
fpu_ChirpData 0.017515 0.00000 test mintime= 0.017334
fpu_opt_ChirpData 0.009602 0.00000 test mintime= 0.005055
sse1_ChirpData_ak8e 0.007170 0.00000 test mintime= 0.007113
sse2_ChirpData_ak8 0.004509 0.00000 test mintime= 0.004488
sse3_ChirpData_ak8 0.004414 0.00000 test mintime= 0.004390
avx_ChirpData_a 0.003774 0.00000 test mintime= 0.003756
avx_ChirpData_b 0.003848 0.00000 test mintime= 0.003806
avx_ChirpData_c 0.004058 0.00000 test mintime= 0.004048
avx_ChirpData_d 0.003937 0.00000 test mintime= 0.003932
avx_ChirpData_e 0.003857 0.00000 test mintime= 0.003853
avx_ChirpData_f 0.003644 0.00000 test mintime= 0.003635
avx_ChirpData_g 0.003543 0.00000 test mintime= 0.003534
avx_ChirpData_h 0.004350 0.00000 test mintime= 0.004335
avx_ChirpData_i 0.003856 0.00000 test mintime= 0.003822
avx_fma4_ChirpData_a 0.003331 0.00000 test mintime= 0.003310
avx_fma4_ChirpData_d4 0.003349 0.00000 test mintime= 0.003341
avx_fma4_ChirpData_d6 0.003335 0.00000 test mintime= 0.003329
avx_fma4_ChirpData_d8 0.003333 0.00000 test mintime= 0.003326
avx_fma4_ChirpData_e 0.003913 0.00000 test mintime= 0.003900
avx_fma4_ChirpData_a 0.003331 0.00000 choice
Third run
v_ChirpData 0.009795 0.00000 test mintime= 0.005379
fpu_ChirpData 0.017380 0.00000 test mintime= 0.017333
fpu_opt_ChirpData 0.009683 0.00000 test mintime= 0.005122
sse1_ChirpData_ak8e 0.007147 0.00000 test mintime= 0.007113
sse2_ChirpData_ak8 0.004544 0.00000 test mintime= 0.004502
sse3_ChirpData_ak8 0.004440 0.00000 test mintime= 0.004403
avx_ChirpData_a 0.003776 0.00000 test mintime= 0.003748
avx_ChirpData_b 0.003836 0.00000 test mintime= 0.003741
avx_ChirpData_c 0.004129 0.00000 test mintime= 0.004049
avx_ChirpData_d 0.003951 0.00000 test mintime= 0.003937
avx_ChirpData_e 0.003870 0.00000 test mintime= 0.003853
avx_ChirpData_f 0.003650 0.00000 test mintime= 0.003638
avx_ChirpData_g 0.003574 0.00000 test mintime= 0.003537
avx_ChirpData_h 0.004354 0.00000 test mintime= 0.004334
avx_ChirpData_i 0.003865 0.00000 test mintime= 0.003821
avx_fma4_ChirpData_a 0.003316 0.00000 test mintime= 0.003311
avx_fma4_ChirpData_d4 0.003347 0.00000 test mintime= 0.003342
avx_fma4_ChirpData_d6 0.003332 0.00000 test mintime= 0.003325
avx_fma4_ChirpData_d8 0.003330 0.00000 test mintime= 0.003326
avx_fma4_ChirpData_e 0.003932 0.00000 test mintime= 0.003908
avx_fma4_ChirpData_a 0.003316 0.00000 choice
Test duration 10.93 seconds
Ftst_v7 completed successfully.
i3-2120@3.3
BOINC IDLE
=========================================================
Ftst_v7_J54_Chirponly started.
Optimal function choices:
--------------------------------------------------------
name timing error
--------------------------------------------------------
v_ChirpData 0.004736 0.00000 test mintime= 0.003079
fpu_ChirpData 0.012392 0.00000 test mintime= 0.012347
fpu_opt_ChirpData 0.004540 0.00000 test mintime= 0.002796
sse1_ChirpData_ak8e 0.005779 0.00000 test mintime= 0.005765
sse2_ChirpData_ak8 0.004182 0.00000 test mintime= 0.004173
sse3_ChirpData_ak8 0.004011 0.00000 test mintime= 0.003991
avx_ChirpData_a 0.002091 0.00000 test mintime= 0.002079
avx_ChirpData_b 0.002050 0.00000 test mintime= 0.002033
avx_ChirpData_c 0.002109 0.00000 test mintime= 0.002099
avx_ChirpData_d 0.001937 0.00000 test mintime= 0.001930
avx_ChirpData_e 0.001919 0.00000 test mintime= 0.001915
avx_ChirpData_f 0.002059 0.00000 test mintime= 0.002043
avx_ChirpData_g 0.002114 0.00000 test mintime= 0.002072
avx_ChirpData_h 0.002664 0.00000 test mintime= 0.002657
avx_ChirpData_i 0.002322 0.00000 test mintime= 0.002216
avx_fma4_ChirpData_a not supported by system
avx_fma4_ChirpData_d4 not supported by system
avx_fma4_ChirpData_d6 not supported by system
avx_fma4_ChirpData_d8 not supported by system
avx_fma4_ChirpData_e not supported by system
avx_ChirpData_e 0.001919 0.00000 choice
Second run
v_ChirpData 0.004711 0.00000 test mintime= 0.003087
fpu_ChirpData 0.012465 0.00000 test mintime= 0.012372
fpu_opt_ChirpData 0.004542 0.00000 test mintime= 0.002788
sse1_ChirpData_ak8e 0.005808 0.00000 test mintime= 0.005765
sse2_ChirpData_ak8 0.004187 0.00000 test mintime= 0.004172
sse3_ChirpData_ak8 0.004033 0.00000 test mintime= 0.003997
avx_ChirpData_a 0.002120 0.00000 test mintime= 0.002079
avx_ChirpData_b 0.002092 0.00000 test mintime= 0.002032
avx_ChirpData_c 0.002111 0.00000 test mintime= 0.002100
avx_ChirpData_d 0.001945 0.00000 test mintime= 0.001933
avx_ChirpData_e 0.001928 0.00000 test mintime= 0.001918
avx_ChirpData_f 0.002057 0.00000 test mintime= 0.002042
avx_ChirpData_g 0.002103 0.00000 test mintime= 0.002072
avx_ChirpData_h 0.002668 0.00000 test mintime= 0.002656
avx_ChirpData_i 0.002222 0.00000 test mintime= 0.002214
avx_fma4_ChirpData_a not supported by system
avx_fma4_ChirpData_d4 not supported by system
avx_fma4_ChirpData_d6 not supported by system
avx_fma4_ChirpData_d8 not supported by system
avx_fma4_ChirpData_e not supported by system
avx_ChirpData_e 0.001928 0.00000 choice
Third run
v_ChirpData 0.004706 0.00000 test mintime= 0.003076
fpu_ChirpData 0.012670 0.00000 test mintime= 0.012353
fpu_opt_ChirpData 0.004944 0.00000 test mintime= 0.002788
sse1_ChirpData_ak8e 0.005822 0.00000 test mintime= 0.005767
sse2_ChirpData_ak8 0.004212 0.00000 test mintime= 0.004173
sse3_ChirpData_ak8 0.004047 0.00000 test mintime= 0.003995
avx_ChirpData_a 0.002284 0.00000 test mintime= 0.002082
avx_ChirpData_b 0.002036 0.00000 test mintime= 0.002034
avx_ChirpData_c 0.002104 0.00000 test mintime= 0.002100
avx_ChirpData_d 0.001941 0.00000 test mintime= 0.001931
avx_ChirpData_e 0.001917 0.00000 test mintime= 0.001916
avx_ChirpData_f 0.002052 0.00000 test mintime= 0.002042
avx_ChirpData_g 0.002077 0.00000 test mintime= 0.002072
avx_ChirpData_h 0.002668 0.00000 test mintime= 0.002657
avx_ChirpData_i 0.002220 0.00000 test mintime= 0.002213
avx_fma4_ChirpData_a not supported by system
avx_fma4_ChirpData_d4 not supported by system
avx_fma4_ChirpData_d6 not supported by system
avx_fma4_ChirpData_d8 not supported by system
avx_fma4_ChirpData_e not supported by system
avx_ChirpData_e 0.001917 0.00000 choice
Test duration 8.06 seconds
Ftst_v7 completed successfully.
QuoteJosef W. Segur (http://lunatics.kwsn.net/index.php?action=profile;u=27) wrote:
Although there are still puzzles from the tests so far, with the attached J55 I've added another dimension to the tests. J54 and earlier have been doing full Mebisample chirping as needed before doing Gaussian, Pulse, and Triplet finding. For cases where that's not needed, AK_v8 becomes more cache friendly by subdividing. So I modified all the chirp functions to support that, and J55 does testing at 128K and 32K in addition. The timings ought to be about 1/8 and 1/32 of the full length tests.
I do appreciate the testing, and am glad the Ivy Bridge system reacted like other Intel CPUs. Whatever form of dispatch is eventually used, keeping the number of code paths low will be more efficient.
Joe
FX-4100@3.6
BOINC idle
=========================================================
Ftst_v7_J55_Chirponly started.
Optimal function choices:
--------------------------------------------------------
name timing error
--------------------------------------------------------
First run, 1048576 sample testing
v_ChirpData 0.051682 0.00000 test mintime= 0.051531
fpu_ChirpData 0.017529 0.00000 test mintime= 0.017479
sse1_ChirpData_ak8e 0.007230 0.00000 test mintime= 0.007164
sse2_ChirpData_ak8 0.004583 0.00000 test mintime= 0.004521
sse3_ChirpData_ak8 0.004468 0.00000 test mintime= 0.004435
avx_ChirpData_a 0.003825 0.00000 test mintime= 0.003762
avx_ChirpData_b 0.003839 0.00000 test mintime= 0.003779
avx_ChirpData_c 0.004100 0.00000 test mintime= 0.004079
avx_ChirpData_d 0.003990 0.00000 test mintime= 0.003967
avx_ChirpData_e 0.003914 0.00000 test mintime= 0.003844
avx_ChirpData_f 0.003695 0.00000 test mintime= 0.003664
avx_ChirpData_g 0.003653 0.00000 test mintime= 0.003586
avx_ChirpData_h 0.004360 0.00000 test mintime= 0.004313
avx_ChirpData_i 0.003781 0.00000 test mintime= 0.003734
avx_fma4_ChirpData_a 0.003349 0.00000 test mintime= 0.003328
avx_fma4_ChirpData_d4 0.003376 0.00000 test mintime= 0.003356
avx_fma4_ChirpData_d6 0.003417 0.00000 test mintime= 0.003329
avx_fma4_ChirpData_d8 0.003378 0.00000 test mintime= 0.003339
avx_fma4_ChirpData_e 0.003745 0.00000 test mintime= 0.003706
avx_fma4_ChirpData_a 0.003349 0.00000 choice
Second run, 131072 sample testing
v_ChirpData 0.006409 0.00000 test mintime= 0.006369
fpu_ChirpData 0.002194 0.00000 test mintime= 0.002161
sse1_ChirpData_ak8e 0.000900 0.00000 test mintime= 0.000887
sse2_ChirpData_ak8 0.000573 0.00000 test mintime= 0.000564
sse3_ChirpData_ak8 0.000561 0.00000 test mintime= 0.000549
avx_ChirpData_a 0.000477 0.00000 test mintime= 0.000470
avx_ChirpData_b 0.000486 0.00000 test mintime= 0.000478
avx_ChirpData_c 0.000513 0.00000 test mintime= 0.000505
avx_ChirpData_d 0.000502 0.00000 test mintime= 0.000492
avx_ChirpData_e 0.000483 0.00000 test mintime= 0.000456
avx_ChirpData_f 0.000460 0.00000 test mintime= 0.000453
avx_ChirpData_g 0.000450 0.00000 test mintime= 0.000440
avx_ChirpData_h 0.000543 0.00000 test mintime= 0.000531
avx_ChirpData_i 0.000459 0.00000 test mintime= 0.000446
avx_fma4_ChirpData_a 0.000417 0.00000 test mintime= 0.000410
avx_fma4_ChirpData_d4 0.000429 0.00000 test mintime= 0.000415
avx_fma4_ChirpData_d6 0.000419 0.00000 test mintime= 0.000414
avx_fma4_ChirpData_d8 0.000423 0.00000 test mintime= 0.000414
avx_fma4_ChirpData_e 0.000465 0.00000 test mintime= 0.000456
avx_fma4_ChirpData_a 0.000417 0.00000 choice
Third run, 32768 sample testing
v_ChirpData 0.001609 0.00000 test mintime= 0.001590
fpu_ChirpData 0.000548 0.00000 test mintime= 0.000537
sse1_ChirpData_ak8e 0.000225 0.00000 test mintime= 0.000221
sse2_ChirpData_ak8 0.000144 0.00000 test mintime= 0.000140
sse3_ChirpData_ak8 0.000140 0.00000 test mintime= 0.000137
avx_ChirpData_a 0.000120 0.00000 test mintime= 0.000117
avx_ChirpData_b 0.000122 0.00000 test mintime= 0.000120
avx_ChirpData_c 0.000129 0.00000 test mintime= 0.000126
avx_ChirpData_d 0.000125 0.00000 test mintime= 0.000123
avx_ChirpData_e 0.000119 0.00000 test mintime= 0.000114
avx_ChirpData_f 0.000115 0.00000 test mintime= 0.000113
avx_ChirpData_g 0.000112 0.00000 test mintime= 0.000110
avx_ChirpData_h 0.000135 0.00000 test mintime= 0.000132
avx_ChirpData_i 0.000113 0.00000 test mintime= 0.000111
avx_fma4_ChirpData_a 0.000104 0.00000 test mintime= 0.000103
avx_fma4_ChirpData_d4 0.000106 0.00000 test mintime= 0.000104
avx_fma4_ChirpData_d6 0.000105 0.00000 test mintime= 0.000104
avx_fma4_ChirpData_d8 0.000105 0.00000 test mintime= 0.000104
avx_fma4_ChirpData_e 0.000117 0.00000 test mintime= 0.000114
avx_fma4_ChirpData_a 0.000104 0.00000 choice
Test duration 7.34 seconds
Ftst_v7 completed successfully.
i3-2120@3.3
BOINC idle
=========================================================
Ftst_v7_J55_Chirponly started.
Optimal function choices:
--------------------------------------------------------
name timing error
--------------------------------------------------------
First run, 1048576 sample testing
v_ChirpData 0.058924 0.00000 test mintime= 0.058849
fpu_ChirpData 0.012426 0.00000 test mintime= 0.012339
sse1_ChirpData_ak8e 0.005945 0.00000 test mintime= 0.005699
sse2_ChirpData_ak8 0.004193 0.00000 test mintime= 0.004164
sse3_ChirpData_ak8 0.004016 0.00000 test mintime= 0.003993
avx_ChirpData_a 0.002082 0.00000 test mintime= 0.002074
avx_ChirpData_b 0.002039 0.00000 test mintime= 0.002034
avx_ChirpData_c 0.002107 0.00000 test mintime= 0.002098
avx_ChirpData_d 0.001936 0.00000 test mintime= 0.001932
avx_ChirpData_e 0.001928 0.00000 test mintime= 0.001918
avx_ChirpData_f 0.002054 0.00000 test mintime= 0.002044
avx_ChirpData_g 0.002078 0.00000 test mintime= 0.002070
avx_ChirpData_h 0.002735 0.00000 test mintime= 0.002641
avx_ChirpData_i 0.002223 0.00000 test mintime= 0.002212
avx_fma4_ChirpData_a not supported by system
avx_fma4_ChirpData_d4 not supported by system
avx_fma4_ChirpData_d6 not supported by system
avx_fma4_ChirpData_d8 not supported by system
avx_fma4_ChirpData_e not supported by system
avx_ChirpData_e 0.001928 0.00000 choice
Second run, 131072 sample testing
v_ChirpData 0.007376 0.00000 test mintime= 0.007337
fpu_ChirpData 0.001547 0.00000 test mintime= 0.001540
sse1_ChirpData_ak8e 0.000714 0.00000 test mintime= 0.000712
sse2_ChirpData_ak8 0.000522 0.00000 test mintime= 0.000520
sse3_ChirpData_ak8 0.000500 0.00000 test mintime= 0.000498
avx_ChirpData_a 0.000260 0.00000 test mintime= 0.000258
avx_ChirpData_b 0.000255 0.00000 test mintime= 0.000254
avx_ChirpData_c 0.000264 0.00000 test mintime= 0.000261
avx_ChirpData_d 0.000242 0.00000 test mintime= 0.000241
avx_ChirpData_e 0.000242 0.00000 test mintime= 0.000239
avx_ChirpData_f 0.000257 0.00000 test mintime= 0.000255
avx_ChirpData_g 0.000260 0.00000 test mintime= 0.000257
avx_ChirpData_h 0.000329 0.00000 test mintime= 0.000322
avx_ChirpData_i 0.000271 0.00000 test mintime= 0.000267
avx_fma4_ChirpData_a not supported by system
avx_fma4_ChirpData_d4 not supported by system
avx_fma4_ChirpData_d6 not supported by system
avx_fma4_ChirpData_d8 not supported by system
avx_fma4_ChirpData_e not supported by system
avx_ChirpData_e 0.000242 0.00000 choice
Third run, 32768 sample testing
v_ChirpData 0.001841 0.00000 test mintime= 0.001834
fpu_ChirpData 0.000387 0.00000 test mintime= 0.000385
sse1_ChirpData_ak8e 0.000179 0.00000 test mintime= 0.000178
sse2_ChirpData_ak8 0.000131 0.00000 test mintime= 0.000130
sse3_ChirpData_ak8 0.000125 0.00000 test mintime= 0.000124
avx_ChirpData_a 0.000065 0.00000 test mintime= 0.000064
avx_ChirpData_b 0.000064 0.00000 test mintime= 0.000063
avx_ChirpData_c 0.000066 0.00000 test mintime= 0.000065
avx_ChirpData_d 0.000064 0.00000 test mintime= 0.000060
avx_ChirpData_e 0.000060 0.00000 test mintime= 0.000059
avx_ChirpData_f 0.000065 0.00000 test mintime= 0.000063
avx_ChirpData_g 0.000065 0.00000 test mintime= 0.000064
avx_ChirpData_h 0.000081 0.00000 test mintime= 0.000079
avx_ChirpData_i 0.000069 0.00000 test mintime= 0.000064
avx_fma4_ChirpData_a not supported by system
avx_fma4_ChirpData_d4 not supported by system
avx_fma4_ChirpData_d6 not supported by system
avx_fma4_ChirpData_d8 not supported by system
avx_fma4_ChirpData_e not supported by system
avx_ChirpData_e 0.000060 0.00000 choice
Test duration 5.55 seconds
Ftst_v7 completed successfully.
Quote
Josef W. Segur (http://lunatics.kwsn.net/index.php?action=profile;u=27) wrote:
The J55 test was built with GCC 4.5.1, I'm attaching J55b built with GCC 4.6.1 to see if there's any significant difference. If you have time to run both in order for the environment to be as similar as possible, that would be best. All functions may be affected, not just those targeting Bulldozer or Sandy Bridge.
One of the particular puzzles is why avx_fma4_ChirpData_a seems to be faster than avx_fma4_ChirpData_d4, the 4.5.1 build seemed not to optimize the instruction ordering of the d subvariants as well, but the CPU out of order execution capabilities ought to have been sufficient to handle it. These routines necessarily have serious dependency chain problems so there isn't a lot of room for reordering anyhow.
Joe
FX-4100
BOINC Idle
=========================================================
Ftst_v7_J55_Chirponly started.
Optimal function choices:
--------------------------------------------------------
name timing error
--------------------------------------------------------
First run, 1048576 sample testing
v_ChirpData 0.051829 0.00000 test mintime= 0.051639
fpu_ChirpData 0.017697 0.00000 test mintime= 0.017562
sse1_ChirpData_ak8e 0.007256 0.00000 test mintime= 0.007218
sse2_ChirpData_ak8 0.004547 0.00000 test mintime= 0.004528
sse3_ChirpData_ak8 0.004486 0.00000 test mintime= 0.004450
avx_ChirpData_a 0.003830 0.00000 test mintime= 0.003812
avx_ChirpData_b 0.003889 0.00000 test mintime= 0.003792
avx_ChirpData_c 0.004161 0.00000 test mintime= 0.004119
avx_ChirpData_d 0.004023 0.00000 test mintime= 0.003978
avx_ChirpData_e 0.003911 0.00000 test mintime= 0.003853
avx_ChirpData_f 0.003730 0.00000 test mintime= 0.003684
avx_ChirpData_g 0.003687 0.00000 test mintime= 0.003626
avx_ChirpData_h 0.004389 0.00000 test mintime= 0.004343
avx_ChirpData_i 0.003824 0.00000 test mintime= 0.003775
avx_fma4_ChirpData_a 0.003376 0.00000 test mintime= 0.003330
avx_fma4_ChirpData_d4 0.003397 0.00000 test mintime= 0.003355
avx_fma4_ChirpData_d6 0.003379 0.00000 test mintime= 0.003348
avx_fma4_ChirpData_d8 0.003397 0.00000 test mintime= 0.003363
avx_fma4_ChirpData_e 0.003773 0.00000 test mintime= 0.003720
avx_fma4_ChirpData_a 0.003376 0.00000 choice
Second run, 131072 sample testing
v_ChirpData 0.006456 0.00000 test mintime= 0.006373
fpu_ChirpData 0.002205 0.00000 test mintime= 0.002167
sse1_ChirpData_ak8e 0.000905 0.00000 test mintime= 0.000891
sse2_ChirpData_ak8 0.000577 0.00000 test mintime= 0.000564
sse3_ChirpData_ak8 0.000561 0.00000 test mintime= 0.000550
avx_ChirpData_a 0.000482 0.00000 test mintime= 0.000470
avx_ChirpData_b 0.000488 0.00000 test mintime= 0.000478
avx_ChirpData_c 0.000515 0.00000 test mintime= 0.000505
avx_ChirpData_d 0.000502 0.00000 test mintime= 0.000493
avx_ChirpData_e 0.000480 0.00000 test mintime= 0.000456
avx_ChirpData_f 0.000461 0.00000 test mintime= 0.000453
avx_ChirpData_g 0.000455 0.00000 test mintime= 0.000441
avx_ChirpData_h 0.000545 0.00000 test mintime= 0.000531
avx_ChirpData_i 0.000462 0.00000 test mintime= 0.000446
avx_fma4_ChirpData_a 0.000419 0.00000 test mintime= 0.000411
avx_fma4_ChirpData_d4 0.000423 0.00000 test mintime= 0.000415
avx_fma4_ChirpData_d6 0.000421 0.00000 test mintime= 0.000415
avx_fma4_ChirpData_d8 0.000422 0.00000 test mintime= 0.000414
avx_fma4_ChirpData_e 0.000468 0.00000 test mintime= 0.000457
avx_fma4_ChirpData_a 0.000419 0.00000 choice
Third run, 32768 sample testing
v_ChirpData 0.001632 0.00000 test mintime= 0.001590
fpu_ChirpData 0.000555 0.00000 test mintime= 0.000539
sse1_ChirpData_ak8e 0.000227 0.00000 test mintime= 0.000222
sse2_ChirpData_ak8 0.000145 0.00000 test mintime= 0.000140
sse3_ChirpData_ak8 0.000141 0.00000 test mintime= 0.000137
avx_ChirpData_a 0.000120 0.00000 test mintime= 0.000117
avx_ChirpData_b 0.000119 0.00000 test mintime= 0.000116
avx_ChirpData_c 0.000129 0.00000 test mintime= 0.000126
avx_ChirpData_d 0.000127 0.00000 test mintime= 0.000123
avx_ChirpData_e 0.000120 0.00000 test mintime= 0.000114
avx_ChirpData_f 0.000117 0.00000 test mintime= 0.000113
avx_ChirpData_g 0.000114 0.00000 test mintime= 0.000110
avx_ChirpData_h 0.000137 0.00000 test mintime= 0.000133
avx_ChirpData_i 0.000115 0.00000 test mintime= 0.000111
avx_fma4_ChirpData_a 0.000105 0.00000 test mintime= 0.000103
avx_fma4_ChirpData_d4 0.000107 0.00000 test mintime= 0.000104
avx_fma4_ChirpData_d6 0.000106 0.00000 test mintime= 0.000104
avx_fma4_ChirpData_d8 0.000106 0.00000 test mintime= 0.000104
avx_fma4_ChirpData_e 0.000117 0.00000 test mintime= 0.000114
avx_fma4_ChirpData_a 0.000105 0.00000 choice
Test duration 7.39 seconds
Ftst_v7 completed successfully.
=========================================================
Ftst_v7_J55b_Chirponly started.
Optimal function choices:
--------------------------------------------------------
name timing error
--------------------------------------------------------
First run, 1048576 sample testing
v_ChirpData 0.052452 0.00000 test mintime= 0.052096
fpu_ChirpData 0.018704 0.00000 test mintime= 0.017854
sse1_ChirpData_ak8e 0.007731 0.00000 test mintime= 0.007256
sse2_ChirpData_ak8 0.004579 0.00000 test mintime= 0.004497
sse3_ChirpData_ak8 0.004591 0.00000 test mintime= 0.004549
avx_ChirpData_a 0.004131 0.00000 test mintime= 0.003764
avx_ChirpData_b 0.004169 0.00000 test mintime= 0.003948
avx_ChirpData_c 0.004434 0.00000 test mintime= 0.003979
avx_ChirpData_d 0.004127 0.00000 test mintime= 0.003956
avx_ChirpData_e 0.004005 0.00000 test mintime= 0.003870
avx_ChirpData_f 0.003865 0.00000 test mintime= 0.003655
avx_ChirpData_g 0.004126 0.00000 test mintime= 0.003680
avx_ChirpData_h 0.004696 0.00000 test mintime= 0.004399
avx_ChirpData_i 0.004318 0.00000 test mintime= 0.003751
avx_fma4_ChirpData_a 0.003619 0.00000 test mintime= 0.003408
avx_fma4_ChirpData_d4 0.003713 0.00000 test mintime= 0.003264
avx_fma4_ChirpData_d6 0.004176 0.00000 test mintime= 0.003271
avx_fma4_ChirpData_d8 0.003497 0.00000 test mintime= 0.003206
avx_fma4_ChirpData_e 0.003928 0.00000 test mintime= 0.003882
avx_fma4_ChirpData_d8 0.003497 0.00000 choice
Second run, 131072 sample testing
v_ChirpData 0.006478 0.00000 test mintime= 0.006380
fpu_ChirpData 0.002202 0.00000 test mintime= 0.002172
sse1_ChirpData_ak8e 0.000925 0.00000 test mintime= 0.000902
sse2_ChirpData_ak8 0.000579 0.00000 test mintime= 0.000565
sse3_ChirpData_ak8 0.000575 0.00000 test mintime= 0.000565
avx_ChirpData_a 0.000478 0.00000 test mintime= 0.000466
avx_ChirpData_b 0.000499 0.00000 test mintime= 0.000487
avx_ChirpData_c 0.000498 0.00000 test mintime= 0.000482
avx_ChirpData_d 0.000501 0.00000 test mintime= 0.000490
avx_ChirpData_e 0.000482 0.00000 test mintime= 0.000458
avx_ChirpData_f 0.000464 0.00000 test mintime= 0.000453
avx_ChirpData_g 0.000452 0.00000 test mintime= 0.000442
avx_ChirpData_h 0.000554 0.00000 test mintime= 0.000542
avx_ChirpData_i 0.000459 0.00000 test mintime= 0.000446
avx_fma4_ChirpData_a 0.000431 0.00000 test mintime= 0.000423
avx_fma4_ChirpData_d4 0.000408 0.00000 test mintime= 0.000399
avx_fma4_ChirpData_d6 0.000406 0.00000 test mintime= 0.000398
avx_fma4_ChirpData_d8 0.000417 0.00000 test mintime= 0.000398
avx_fma4_ChirpData_e 0.000493 0.00000 test mintime= 0.000478
avx_fma4_ChirpData_d6 0.000406 0.00000 choice
Third run, 32768 sample testing
v_ChirpData 0.001623 0.00000 test mintime= 0.001589
fpu_ChirpData 0.000556 0.00000 test mintime= 0.000544
sse1_ChirpData_ak8e 0.000228 0.00000 test mintime= 0.000222
sse2_ChirpData_ak8 0.000146 0.00000 test mintime= 0.000139
sse3_ChirpData_ak8 0.000144 0.00000 test mintime= 0.000141
avx_ChirpData_a 0.000118 0.00000 test mintime= 0.000116
avx_ChirpData_b 0.000126 0.00000 test mintime= 0.000122
avx_ChirpData_c 0.000123 0.00000 test mintime= 0.000121
avx_ChirpData_d 0.000124 0.00000 test mintime= 0.000122
avx_ChirpData_e 0.000117 0.00000 test mintime= 0.000114
avx_ChirpData_f 0.000115 0.00000 test mintime= 0.000113
avx_ChirpData_g 0.000118 0.00000 test mintime= 0.000110
avx_ChirpData_h 0.000136 0.00000 test mintime= 0.000133
avx_ChirpData_i 0.000114 0.00000 test mintime= 0.000111
avx_fma4_ChirpData_a 0.000108 0.00000 test mintime= 0.000106
avx_fma4_ChirpData_d4 0.000102 0.00000 test mintime= 0.000099
avx_fma4_ChirpData_d6 0.000101 0.00000 test mintime= 0.000099
avx_fma4_ChirpData_d8 0.000101 0.00000 test mintime= 0.000099
avx_fma4_ChirpData_e 0.000122 0.00000 test mintime= 0.000119
avx_fma4_ChirpData_d6 0.000101 0.00000 choice
Test duration 7.50 seconds
Ftst_v7 completed successfully.
i3-2120
BOINC Idle
=========================================================
Ftst_v7_J55_Chirponly started.
Optimal function choices:
--------------------------------------------------------
name timing error
--------------------------------------------------------
First run, 1048576 sample testing
v_ChirpData 0.059010 0.00000 test mintime= 0.058863
fpu_ChirpData 0.012374 0.00000 test mintime= 0.012352
sse1_ChirpData_ak8e 0.005708 0.00000 test mintime= 0.005699
sse2_ChirpData_ak8 0.004178 0.00000 test mintime= 0.004165
sse3_ChirpData_ak8 0.004003 0.00000 test mintime= 0.003996
avx_ChirpData_a 0.002079 0.00000 test mintime= 0.002073
avx_ChirpData_b 0.002033 0.00000 test mintime= 0.002031
avx_ChirpData_c 0.002100 0.00000 test mintime= 0.002097
avx_ChirpData_d 0.001937 0.00000 test mintime= 0.001931
avx_ChirpData_e 0.001925 0.00000 test mintime= 0.001917
avx_ChirpData_f 0.002049 0.00000 test mintime= 0.002045
avx_ChirpData_g 0.002070 0.00000 test mintime= 0.002067
avx_ChirpData_h 0.003057 0.00000 test mintime= 0.002754
avx_ChirpData_i 0.002221 0.00000 test mintime= 0.002213
avx_fma4_ChirpData_a not supported by system
avx_fma4_ChirpData_d4 not supported by system
avx_fma4_ChirpData_d6 not supported by system
avx_fma4_ChirpData_d8 not supported by system
avx_fma4_ChirpData_e not supported by system
avx_ChirpData_e 0.001925 0.00000 choice
Second run, 131072 sample testing
v_ChirpData 0.007356 0.00000 test mintime= 0.007338
fpu_ChirpData 0.001546 0.00000 test mintime= 0.001540
sse1_ChirpData_ak8e 0.000999 0.00000 test mintime= 0.000712
sse2_ChirpData_ak8 0.000790 0.00000 test mintime= 0.000719
sse3_ChirpData_ak8 0.000540 0.00000 test mintime= 0.000498
avx_ChirpData_a 0.000260 0.00000 test mintime= 0.000258
avx_ChirpData_b 0.000257 0.00000 test mintime= 0.000253
avx_ChirpData_c 0.000263 0.00000 test mintime= 0.000262
avx_ChirpData_d 0.000243 0.00000 test mintime= 0.000240
avx_ChirpData_e 0.000272 0.00000 test mintime= 0.000270
avx_ChirpData_f 0.000279 0.00000 test mintime= 0.000270
avx_ChirpData_g 0.000278 0.00000 test mintime= 0.000258
avx_ChirpData_h 0.000329 0.00000 test mintime= 0.000322
avx_ChirpData_i 0.000272 0.00000 test mintime= 0.000267
avx_fma4_ChirpData_a not supported by system
avx_fma4_ChirpData_d4 not supported by system
avx_fma4_ChirpData_d6 not supported by system
avx_fma4_ChirpData_d8 not supported by system
avx_fma4_ChirpData_e not supported by system
avx_ChirpData_d 0.000243 0.00000 choice
Third run, 32768 sample testing
v_ChirpData 0.001841 0.00000 test mintime= 0.001834
fpu_ChirpData 0.000568 0.00000 test mintime= 0.000385
sse1_ChirpData_ak8e 0.000186 0.00000 test mintime= 0.000184
sse2_ChirpData_ak8 0.000139 0.00000 test mintime= 0.000130
sse3_ChirpData_ak8 0.000125 0.00000 test mintime= 0.000124
avx_ChirpData_a 0.000065 0.00000 test mintime= 0.000064
avx_ChirpData_b 0.000066 0.00000 test mintime= 0.000063
avx_ChirpData_c 0.000066 0.00000 test mintime= 0.000065
avx_ChirpData_d 0.000061 0.00000 test mintime= 0.000060
avx_ChirpData_e 0.000060 0.00000 test mintime= 0.000059
avx_ChirpData_f 0.000064 0.00000 test mintime= 0.000063
avx_ChirpData_g 0.000065 0.00000 test mintime= 0.000064
avx_ChirpData_h 0.000081 0.00000 test mintime= 0.000079
avx_ChirpData_i 0.000065 0.00000 test mintime= 0.000064
avx_fma4_ChirpData_a not supported by system
avx_fma4_ChirpData_d4 not supported by system
avx_fma4_ChirpData_d6 not supported by system
avx_fma4_ChirpData_d8 not supported by system
avx_fma4_ChirpData_e not supported by system
avx_ChirpData_e 0.000060 0.00000 choice
Test duration 5.71 seconds
Ftst_v7 completed successfully.
=========================================================
Ftst_v7_J55b_Chirponly started.
Optimal function choices:
--------------------------------------------------------
name timing error
--------------------------------------------------------
First run, 1048576 sample testing
v_ChirpData 0.059795 0.00000 test mintime= 0.059657
fpu_ChirpData 0.012305 0.00000 test mintime= 0.012282
sse1_ChirpData_ak8e 0.005647 0.00000 test mintime= 0.005621
sse2_ChirpData_ak8 0.004166 0.00000 test mintime= 0.004149
sse3_ChirpData_ak8 0.003970 0.00000 test mintime= 0.003961
avx_ChirpData_a 0.002058 0.00000 test mintime= 0.002057
avx_ChirpData_b 0.002140 0.00000 test mintime= 0.002136
avx_ChirpData_c 0.002060 0.00000 test mintime= 0.002053
avx_ChirpData_d 0.001930 0.00000 test mintime= 0.001926
avx_ChirpData_e 0.001920 0.00000 test mintime= 0.001914
avx_ChirpData_f 0.002045 0.00000 test mintime= 0.002035
avx_ChirpData_g 0.002084 0.00000 test mintime= 0.002066
avx_ChirpData_h 0.002646 0.00000 test mintime= 0.002640
avx_ChirpData_i 0.002205 0.00000 test mintime= 0.002198
avx_fma4_ChirpData_a not supported by system
avx_fma4_ChirpData_d4 not supported by system
avx_fma4_ChirpData_d6 not supported by system
avx_fma4_ChirpData_d8 not supported by system
avx_fma4_ChirpData_e not supported by system
avx_ChirpData_e 0.001920 0.00000 choice
Second run, 131072 sample testing
v_ChirpData 0.007463 0.00000 test mintime= 0.007437
fpu_ChirpData 0.001579 0.00000 test mintime= 0.001523
sse1_ChirpData_ak8e 0.000708 0.00000 test mintime= 0.000706
sse2_ChirpData_ak8 0.000546 0.00000 test mintime= 0.000518
sse3_ChirpData_ak8 0.000496 0.00000 test mintime= 0.000494
avx_ChirpData_a 0.000258 0.00000 test mintime= 0.000256
avx_ChirpData_b 0.000269 0.00000 test mintime= 0.000267
avx_ChirpData_c 0.000257 0.00000 test mintime= 0.000256
avx_ChirpData_d 0.000242 0.00000 test mintime= 0.000240
avx_ChirpData_e 0.000241 0.00000 test mintime= 0.000240
avx_ChirpData_f 0.000257 0.00000 test mintime= 0.000255
avx_ChirpData_g 0.000260 0.00000 test mintime= 0.000259
avx_ChirpData_h 0.000329 0.00000 test mintime= 0.000322
avx_ChirpData_i 0.000271 0.00000 test mintime= 0.000267
avx_fma4_ChirpData_a not supported by system
avx_fma4_ChirpData_d4 not supported by system
avx_fma4_ChirpData_d6 not supported by system
avx_fma4_ChirpData_d8 not supported by system
avx_fma4_ChirpData_e not supported by system
avx_ChirpData_e 0.000241 0.00000 choice
Third run, 32768 sample testing
v_ChirpData 0.001862 0.00000 test mintime= 0.001858
fpu_ChirpData 0.000384 0.00000 test mintime= 0.000383
sse1_ChirpData_ak8e 0.000176 0.00000 test mintime= 0.000175
sse2_ChirpData_ak8 0.000130 0.00000 test mintime= 0.000129
sse3_ChirpData_ak8 0.000124 0.00000 test mintime= 0.000123
avx_ChirpData_a 0.000064 0.00000 test mintime= 0.000064
avx_ChirpData_b 0.000067 0.00000 test mintime= 0.000066
avx_ChirpData_c 0.000064 0.00000 test mintime= 0.000064
avx_ChirpData_d 0.000060 0.00000 test mintime= 0.000060
avx_ChirpData_e 0.000060 0.00000 test mintime= 0.000059
avx_ChirpData_f 0.000064 0.00000 test mintime= 0.000063
avx_ChirpData_g 0.000065 0.00000 test mintime= 0.000064
avx_ChirpData_h 0.000080 0.00000 test mintime= 0.000079
avx_ChirpData_i 0.000065 0.00000 test mintime= 0.000064
avx_fma4_ChirpData_a not supported by system
avx_fma4_ChirpData_d4 not supported by system
avx_fma4_ChirpData_d6 not supported by system
avx_fma4_ChirpData_d8 not supported by system
avx_fma4_ChirpData_e not supported by system
avx_ChirpData_e 0.000060 0.00000 choice
Test duration 5.54 seconds
Ftst_v7 completed successfully.