AVX Optimized App Development

Started by arkayn, May 05, 2012, 02:36:23 PM

Previous topic - Next topic
QuoteJosef W. Segur wrote:
I did sort of a survey of Beta hosts running stock S@H v7 with Bulldozer and Sandy Bridge CPUs to see which chirp variants were chosen most. Bulldozer were about 8% a, 34% b, 10% c, and 49% d. Sandy Bridge were about 13% a, 9% b, 9% c, and 69% d. That was for 277 results on Bulldozer and 296 on Sandy Bridge, so may be at least roughly meaningful.

I'll attach a J46 version of the test which has two added chirp variants which might possibly be even better than the d which was obviously the previous best. I left out the other kinds of functions this time, haven't figured out any significant improvements for those. But each run of the program now does the chirp tests 3 times.

What I'm aiming at, short term, is one best AVX chirp function which can be put into the existing Lunatics CPU code for a targeted AVX build. Hopefully we'll be able to use some dispatch functionality in future to keep the number of different builds down, but that's not ready yet.
                                                               Joe

Nice!  cheers for relaying... I'll give it a go on the i5 when I can.   I'll be sticking the results of that work in AKv8c for sure ;)  See, teamwork is possible without having to strangle one another  :) 
It is not the strongest of the species that survives, nor the most intelligent that survives. It is the one that is the most adaptable to change.
Charles Darwin
---
Chaos: When the present determines the future, but the approximate present does not approximately determine the future.
Edward Lorenz

Since I know you don't want to go back there and could not get to the test even if you did. I figure I should bring it over here for you.

Quote from: arkayn on May 06, 2012, 09:35:54 AM
Since I know you don't want to go back there and could not get to the test even if you did. I figure I should bring it over here for you.

Second part is only a Flick of the Switch.
A person who won't read has no advantage over one who can't read. (Mark Twain)

May 07, 2012, 04:49:45 AM #4 Last Edit: May 08, 2012, 02:56:51 PM by arkayn
QuoteJosef W. Segur wrote:
Hmm, my hopes for a single best variant for both Bulldozer and Sandy Bridge are fading.

I'm still hoping to improve things further for Bulldozer, the attached J47 test has several subvariants of f with the prefetch distance varied from 2 to 6 cache lines (was 4 in J46). Possibly one will get the input data to L1 at just the right time, at least there may be some observable differences.
                                                           Joe

May 08, 2012, 02:55:22 PM #5 Last Edit: May 08, 2012, 02:57:34 PM by arkayn
QuoteJosef W. Segur wrote:
Maybe a measure of how well the hardware prefetching is matched to the memory system. In the attached J48 I've added an fn with no software prefetching, perhaps your system will prefer that over f6.

I've also modified the way the test time is calculated slightly. Each test consists of ten runs and the average of all was used, now I'm dropping the slowest of the ten runs to reduce the effect of transient conditions. I expect it to still vary more with BOINC running than without, though.
                                                               Joe

Will get a look at this on the i5 later tonight hopefully.  Nearly ready, after a rest, to switch to AKv8c rapidfire implementation, so it's great that's Joe's making leaps right now :D 

Jason
It is not the strongest of the species that survives, nor the most intelligent that survives. It is the one that is the most adaptable to change.
Charles Darwin
---
Chaos: When the present determines the future, but the approximate present does not approximately determine the future.
Edward Lorenz

QuoteJosef W. Segur wrote:
Well, it's clear that the software prefetching is doing some good though not clear why the more distant prefetch works slightly better on the 8 core Bulldozers even when BOINC is active. Getting those details pinned down can wait for final tuning though.

With AVX chirping times at ~80% of SSE3 chirping on Bulldozer but ~50% on Sandy Bridge, I'm looking for something with larger effects. One faint possibility is that the way the input and output test buffers are allocated in J48 and earlier might possibly cause L1 cache thrashing. I don't think that's likely, but am attaching J48a. The allocations are revised but functions being tested are unchanged.
                                                            Joe

good god.  He'll have the thing finished before I can get to the i5 ;)

I'll try get a bulk run of the test pieces in soon  :D
It is not the strongest of the species that survives, nor the most intelligent that survives. It is the one that is the most adaptable to change.
Charles Darwin
---
Chaos: When the present determines the future, but the approximate present does not approximately determine the future.
Edward Lorenz

May 10, 2012, 04:30:22 AM #9 Last Edit: May 10, 2012, 04:32:04 AM by arkayn
Quote
Josef W. Segur wrote:

Thanks, it's good to be sure the test allocations weren't causing the problem.

For J49 I've collapsed the f subvariants back to a single with the same prefetch as a through e (4 cache lines ahead). Even though that may not be the best, it makes comparison easier.

Added test g, which loads data in two 128 bit chunks rather than full 256 bit chunks. That's a technique some Intel documents recommend, though it's not expected to make a large difference.

Added test h, which does TLB priming to eliminate delays crossing page boundaries, and prefetches a whole page sized block at once, like the Astropulse TWINDECHIRP. I have hopes that might make a significant difference.

The sse3_ChirpData_ak8 variant didn't have prefetch, so was often slower than sse2_ChirpData_ak8. I've put the prefetch in.

Although I've reviewed the changes to the AVX routines several times, they're significant enough there's some risk of crashing if I missed something. I hope not.
                                                                Joe

Ah! the old tlb priming trick from Intel docs eh ? should be good.  Hopefully the i5 won't fight me tonight...
It is not the strongest of the species that survives, nor the most intelligent that survives. It is the one that is the most adaptable to change.
Charles Darwin
---
Chaos: When the present determines the future, but the approximate present does not approximately determine the future.
Edward Lorenz

Quote
Josef W. Segur wrote:

Trying to find the strengths of Bulldozer, I've added a chirp variant using both AVX and FMA4 in J50. It does reduce the number of instructions in the loop by 8 or more so should have some measurable effect, though it still has to load and save just as much data. Other than the FMA4 changes, it's like the g AVX version.

If I have everything right, it ought to show as unsupported on Sandy Bridge and run on Bulldozer. If not, anything might happen.  Tongue
                                                                     Joe

FX-4100
BOINC running on GTX460

=========================================================
Ftst_v7_J50_Chirponly started.

Optimal function choices:
--------------------------------------------------------
                            name   timing   error
--------------------------------------------------------
                     v_ChirpData 0.009627 0.00000  test
                   fpu_ChirpData 0.019200 0.00000  test
               fpu_opt_ChirpData 0.008951 0.00000  test
             sse1_ChirpData_ak8e 0.007910 0.00000  test
              sse2_ChirpData_ak8 0.005040 0.00000  test
              sse3_ChirpData_ak8 0.004927 0.00000  test
                 avx_ChirpData_a 0.004119 0.00000  test
                 avx_ChirpData_b 0.004149 0.00000  test
                 avx_ChirpData_c 0.004650 0.00000  test
                 avx_ChirpData_d 0.004221 0.00000  test
                 avx_ChirpData_e 0.004187 0.00000  test
                 avx_ChirpData_f 0.004013 0.00000  test
                 avx_ChirpData_g 0.004171 0.00000  test
                 avx_ChirpData_h 0.005179 0.00000  test
            avx_fma4_ChirpData_a 0.003669 0.00000  test
            avx_fma4_ChirpData_a 0.003669 0.00000  choice

            Second run

                     v_ChirpData 0.009635 0.00000  test
                   fpu_ChirpData 0.018249 0.00000  test
               fpu_opt_ChirpData 0.009154 0.00000  test
             sse1_ChirpData_ak8e 0.007586 0.00000  test
              sse2_ChirpData_ak8 0.004708 0.00000  test
              sse3_ChirpData_ak8 0.004546 0.00000  test
                 avx_ChirpData_a 0.004097 0.00000  test
                 avx_ChirpData_b 0.004024 0.00000  test
                 avx_ChirpData_c 0.004339 0.00000  test
                 avx_ChirpData_d 0.004329 0.00000  test
                 avx_ChirpData_e 0.004205 0.00000  test
                 avx_ChirpData_f 0.003973 0.00000  test
                 avx_ChirpData_g 0.003893 0.00000  test
                 avx_ChirpData_h 0.004708 0.00000  test
            avx_fma4_ChirpData_a 0.003704 0.00000  test
            avx_fma4_ChirpData_a 0.003704 0.00000  choice

            Third run

                     v_ChirpData 0.009304 0.00000  test
                   fpu_ChirpData 0.019267 0.00000  test
               fpu_opt_ChirpData 0.008838 0.00000  test
             sse1_ChirpData_ak8e 0.007273 0.00000  test
              sse2_ChirpData_ak8 0.004618 0.00000  test
              sse3_ChirpData_ak8 0.004530 0.00000  test
                 avx_ChirpData_a 0.004216 0.00000  test
                 avx_ChirpData_b 0.004080 0.00000  test
                 avx_ChirpData_c 0.004223 0.00000  test
                 avx_ChirpData_d 0.004374 0.00000  test
                 avx_ChirpData_e 0.004065 0.00000  test
                 avx_ChirpData_f 0.003829 0.00000  test
                 avx_ChirpData_g 0.004143 0.00000  test
                 avx_ChirpData_h 0.004819 0.00000  test
            avx_fma4_ChirpData_a 0.003452 0.00000  test
            avx_fma4_ChirpData_a 0.003452 0.00000  choice

                   Test duration     9.33 seconds

Ftst_v7 completed successfully.


i3-2120
BOINC running on GTX560

=========================================================
Ftst_v7_J50_Chirponly started.

Optimal function choices:
--------------------------------------------------------
                            name   timing   error
--------------------------------------------------------
                     v_ChirpData 0.004834 0.00000  test
                   fpu_ChirpData 0.012480 0.00000  test
               fpu_opt_ChirpData 0.004621 0.00000  test
             sse1_ChirpData_ak8e 0.005919 0.00000  test
              sse2_ChirpData_ak8 0.004306 0.00000  test
              sse3_ChirpData_ak8 0.004118 0.00000  test
                 avx_ChirpData_a 0.002149 0.00000  test
                 avx_ChirpData_b 0.002130 0.00000  test
                 avx_ChirpData_c 0.002282 0.00000  test
                 avx_ChirpData_d 0.002143 0.00000  test
                 avx_ChirpData_e 0.002010 0.00000  test
                 avx_ChirpData_f 0.002116 0.00000  test
                 avx_ChirpData_g 0.002156 0.00000  test
                 avx_ChirpData_h 0.002744 0.00000  test
            avx_fma4_ChirpData_a not supported by system
                 avx_ChirpData_e 0.002010 0.00000  choice

            Second run

                     v_ChirpData 0.004538 0.00000  test
                   fpu_ChirpData 0.012324 0.00000  test
               fpu_opt_ChirpData 0.004330 0.00000  test
             sse1_ChirpData_ak8e 0.005758 0.00000  test
              sse2_ChirpData_ak8 0.004179 0.00000  test
              sse3_ChirpData_ak8 0.004003 0.00000  test
                 avx_ChirpData_a 0.002143 0.00000  test
                 avx_ChirpData_b 0.002585 0.00000  test
                 avx_ChirpData_c 0.002312 0.00000  test
                 avx_ChirpData_d 0.001930 0.00000  test
                 avx_ChirpData_e 0.002107 0.00000  test
                 avx_ChirpData_f 0.002309 0.00000  test
                 avx_ChirpData_g 0.002067 0.00000  test
                 avx_ChirpData_h 0.002657 0.00000  test
            avx_fma4_ChirpData_a not supported by system
                 avx_ChirpData_d 0.001930 0.00000  choice

            Third run

                     v_ChirpData 0.005770 0.00000  test
                   fpu_ChirpData 0.012279 0.00000  test
               fpu_opt_ChirpData 0.004469 0.00000  test
             sse1_ChirpData_ak8e 0.006027 0.00000  test
              sse2_ChirpData_ak8 0.004288 0.00000  test
              sse3_ChirpData_ak8 0.004026 0.00000  test
                 avx_ChirpData_a 0.002081 0.00000  test
                 avx_ChirpData_b 0.002052 0.00000  test
                 avx_ChirpData_c 0.002536 0.00000  test
                 avx_ChirpData_d 0.001946 0.00000  test
                 avx_ChirpData_e 0.001987 0.00000  test
                 avx_ChirpData_f 0.002057 0.00000  test
                 avx_ChirpData_g 0.002298 0.00000  test
                 avx_ChirpData_h 0.002790 0.00000  test
            avx_fma4_ChirpData_a not supported by system
                 avx_ChirpData_d 0.001946 0.00000  choice

                   Test duration     7.68 seconds

Ftst_v7 completed successfully.

Mmmm, tasty  :).  The i3's still putting the FX to shame, but a nice definite improvement. With full dispatch going in, I can probably compile in a special fftw with FMA's enabled only for the AVX capable AMDs, so at least it gives them a fighting chance with FFTs as well.

Jason
It is not the strongest of the species that survives, nor the most intelligent that survives. It is the one that is the most adaptable to change.
Charles Darwin
---
Chaos: When the present determines the future, but the approximate present does not approximately determine the future.
Edward Lorenz

QuoteJosef W. Segur wrote:
The FMA4 a variant produced about a 5% speedup by reducing the number of floating point instructions in the inner loop by ~11%. That's good, but confirms that getting the data transferred still needs improvement. For J51 I'm trying the TLB priming again, but without block prefetching. The i variant for AVX is modified from the h variant, and the changes were merged to the b variant for AVX+FMA4.
                                                         Joe

FX-4100
BOINC running on 460

=========================================================
Ftst_v7_J51_Chirponly started.

Optimal function choices:
--------------------------------------------------------
                            name   timing   error
--------------------------------------------------------
                     v_ChirpData 0.009009 0.00000  test
                   fpu_ChirpData 0.018200 0.00000  test
               fpu_opt_ChirpData 0.008752 0.00000  test
             sse1_ChirpData_ak8e 0.007503 0.00000  test
              sse2_ChirpData_ak8 0.004782 0.00000  test
              sse3_ChirpData_ak8 0.004801 0.00000  test
                 avx_ChirpData_a 0.003903 0.00000  test
                 avx_ChirpData_b 0.003902 0.00000  test
                 avx_ChirpData_c 0.004298 0.00000  test
                 avx_ChirpData_d 0.004136 0.00000  test
                 avx_ChirpData_e 0.003988 0.00000  test
                 avx_ChirpData_f 0.003865 0.00000  test
                 avx_ChirpData_g 0.003858 0.00000  test
                 avx_ChirpData_h 0.004558 0.00000  test
                 avx_ChirpData_i 0.004006 0.00000  test
            avx_fma4_ChirpData_a 0.003524 0.00000  test
            avx_fma4_ChirpData_b 0.060127 0.50095  test
            avx_fma4_ChirpData_a 0.003524 0.00000  choice

            Second run

                     v_ChirpData 0.009023 0.00000  test
                   fpu_ChirpData 0.018034 0.00000  test
               fpu_opt_ChirpData 0.008862 0.00000  test
             sse1_ChirpData_ak8e 0.007292 0.00000  test
              sse2_ChirpData_ak8 0.004615 0.00000  test
              sse3_ChirpData_ak8 0.004532 0.00000  test
                 avx_ChirpData_a 0.003917 0.00000  test
                 avx_ChirpData_b 0.003865 0.00000  test
                 avx_ChirpData_c 0.004167 0.00000  test
                 avx_ChirpData_d 0.004040 0.00000  test
                 avx_ChirpData_e 0.004026 0.00000  test
                 avx_ChirpData_f 0.003821 0.00000  test
                 avx_ChirpData_g 0.003666 0.00000  test
                 avx_ChirpData_h 0.004601 0.00000  test
                 avx_ChirpData_i 0.003980 0.00000  test
            avx_fma4_ChirpData_a 0.003389 0.00000  test
            avx_fma4_ChirpData_b 0.058483 0.50095  test
            avx_fma4_ChirpData_a 0.003389 0.00000  choice

            Third run

                     v_ChirpData 0.008824 0.00000  test
                   fpu_ChirpData 0.017494 0.00000  test
               fpu_opt_ChirpData 0.008599 0.00000  test
             sse1_ChirpData_ak8e 0.007149 0.00000  test
              sse2_ChirpData_ak8 0.004593 0.00000  test
              sse3_ChirpData_ak8 0.004453 0.00000  test
                 avx_ChirpData_a 0.003842 0.00000  test
                 avx_ChirpData_b 0.003825 0.00000  test
                 avx_ChirpData_c 0.004122 0.00000  test
                 avx_ChirpData_d 0.004023 0.00000  test
                 avx_ChirpData_e 0.003950 0.00000  test
                 avx_ChirpData_f 0.003855 0.00000  test
                 avx_ChirpData_g 0.003928 0.00000  test
                 avx_ChirpData_h 0.004565 0.00000  test
                 avx_ChirpData_i 0.004058 0.00000  test
            avx_fma4_ChirpData_a 0.003531 0.00000  test
            avx_fma4_ChirpData_b 0.059600 0.50095  test
            avx_fma4_ChirpData_a 0.003531 0.00000  choice

                   Test duration    11.53 seconds

Ftst_v7 completed successfully.


i3-2120
BOINC running on 560

=========================================================
Ftst_v7_J51_Chirponly started.

Optimal function choices:
--------------------------------------------------------
                            name   timing   error
--------------------------------------------------------
                     v_ChirpData 0.004599 0.00000  test
                   fpu_ChirpData 0.012435 0.00000  test
               fpu_opt_ChirpData 0.004366 0.00000  test
             sse1_ChirpData_ak8e 0.006014 0.00000  test
              sse2_ChirpData_ak8 0.004207 0.00000  test
              sse3_ChirpData_ak8 0.004177 0.00000  test
                 avx_ChirpData_a 0.002153 0.00000  test
                 avx_ChirpData_b 0.002141 0.00000  test
                 avx_ChirpData_c 0.002217 0.00000  test
                 avx_ChirpData_d 0.002032 0.00000  test
                 avx_ChirpData_e 0.002002 0.00000  test
                 avx_ChirpData_f 0.002125 0.00000  test
                 avx_ChirpData_g 0.002081 0.00000  test
                 avx_ChirpData_h 0.002745 0.00000  test
                 avx_ChirpData_i 0.002329 0.00000  test
            avx_fma4_ChirpData_a not supported by system
            avx_fma4_ChirpData_b not supported by system
                 avx_ChirpData_e 0.002002 0.00000  choice

            Second run

                     v_ChirpData 0.004888 0.00000  test
                   fpu_ChirpData 0.012563 0.00000  test
               fpu_opt_ChirpData 0.004551 0.00000  test
             sse1_ChirpData_ak8e 0.005902 0.00000  test
              sse2_ChirpData_ak8 0.004339 0.00000  test
              sse3_ChirpData_ak8 0.004017 0.00000  test
                 avx_ChirpData_a 0.002142 0.00000  test
                 avx_ChirpData_b 0.002153 0.00000  test
                 avx_ChirpData_c 0.002186 0.00000  test
                 avx_ChirpData_d 0.002007 0.00000  test
                 avx_ChirpData_e 0.001946 0.00000  test
                 avx_ChirpData_f 0.002063 0.00000  test
                 avx_ChirpData_g 0.002174 0.00000  test
                 avx_ChirpData_h 0.002790 0.00000  test
                 avx_ChirpData_i 0.002347 0.00000  test
            avx_fma4_ChirpData_a not supported by system
            avx_fma4_ChirpData_b not supported by system
                 avx_ChirpData_e 0.001946 0.00000  choice

            Third run

                     v_ChirpData 0.004868 0.00000  test
                   fpu_ChirpData 0.012536 0.00000  test
               fpu_opt_ChirpData 0.004565 0.00000  test
             sse1_ChirpData_ak8e 0.005728 0.00000  test
              sse2_ChirpData_ak8 0.004225 0.00000  test
              sse3_ChirpData_ak8 0.004123 0.00000  test
                 avx_ChirpData_a 0.002121 0.00000  test
                 avx_ChirpData_b 0.002155 0.00000  test
                 avx_ChirpData_c 0.002184 0.00000  test
                 avx_ChirpData_d 0.002048 0.00000  test
                 avx_ChirpData_e 0.002039 0.00000  test
                 avx_ChirpData_f 0.002137 0.00000  test
                 avx_ChirpData_g 0.002188 0.00000  test
                 avx_ChirpData_h 0.002760 0.00000  test
                 avx_ChirpData_i 0.002335 0.00000  test
            avx_fma4_ChirpData_a not supported by system
            avx_fma4_ChirpData_b not supported by system
                 avx_ChirpData_e 0.002039 0.00000  choice

                   Test duration     8.08 seconds

Ftst_v7 completed successfully.

Have you seen that FFTW 3.3.2 has been released?

Claggy

QuoteJosef W. Segur wrote:
Another new Chirponly test, J52, is attached.

Fixed (I hope) the problem which made the avx_fma4 b variant so slow and inaccurate, added a c variant with a different approach to TLB priming.
                                   Joe

May 18, 2012, 10:18:15 AM #18 Last Edit: May 18, 2012, 10:20:07 AM by arkayn
FX-4100
BOINC Running on 460

=========================================================
Ftst_v7_J52_Chirponly started.

Optimal function choices:
--------------------------------------------------------
                            name   timing   error
--------------------------------------------------------
                     v_ChirpData 0.008840 0.00000  test
                   fpu_ChirpData 0.018138 0.00000  test
               fpu_opt_ChirpData 0.009096 0.00000  test
             sse1_ChirpData_ak8e 0.007134 0.00000  test
              sse2_ChirpData_ak8 0.004616 0.00000  test
              sse3_ChirpData_ak8 0.004581 0.00000  test
                 avx_ChirpData_a 0.004071 0.00000  test
                 avx_ChirpData_b 0.003964 0.00000  test
                 avx_ChirpData_c 0.004295 0.00000  test
                 avx_ChirpData_d 0.004130 0.00000  test
                 avx_ChirpData_e 0.003982 0.00000  test
                 avx_ChirpData_f 0.003781 0.00000  test
                 avx_ChirpData_g 0.003714 0.00000  test
                 avx_ChirpData_h 0.004528 0.00000  test
                 avx_ChirpData_i 0.003994 0.00000  test
            avx_fma4_ChirpData_a 0.003473 0.00000  test
            avx_fma4_ChirpData_b 0.003617 0.00000  test
            avx_fma4_ChirpData_c 0.003739 0.00000  test
            avx_fma4_ChirpData_a 0.003473 0.00000  choice

            Second run

                     v_ChirpData 0.009005 0.00000  test
                   fpu_ChirpData 0.017681 0.00000  test
               fpu_opt_ChirpData 0.008559 0.00000  test
             sse1_ChirpData_ak8e 0.007305 0.00000  test
              sse2_ChirpData_ak8 0.004635 0.00000  test
              sse3_ChirpData_ak8 0.004459 0.00000  test
                 avx_ChirpData_a 0.003960 0.00000  test
                 avx_ChirpData_b 0.003880 0.00000  test
                 avx_ChirpData_c 0.004260 0.00000  test
                 avx_ChirpData_d 0.004184 0.00000  test
                 avx_ChirpData_e 0.004021 0.00000  test
                 avx_ChirpData_f 0.003816 0.00000  test
                 avx_ChirpData_g 0.003791 0.00000  test
                 avx_ChirpData_h 0.004508 0.00000  test
                 avx_ChirpData_i 0.003953 0.00000  test
            avx_fma4_ChirpData_a 0.003404 0.00000  test
            avx_fma4_ChirpData_b 0.003597 0.00000  test
            avx_fma4_ChirpData_c 0.003738 0.00000  test
            avx_fma4_ChirpData_a 0.003404 0.00000  choice

            Third run

                     v_ChirpData 0.008951 0.00000  test
                   fpu_ChirpData 0.017233 0.00000  test
               fpu_opt_ChirpData 0.008535 0.00000  test
             sse1_ChirpData_ak8e 0.007110 0.00000  test
              sse2_ChirpData_ak8 0.004573 0.00000  test
              sse3_ChirpData_ak8 0.004376 0.00000  test
                 avx_ChirpData_a 0.003833 0.00000  test
                 avx_ChirpData_b 0.003780 0.00000  test
                 avx_ChirpData_c 0.004112 0.00000  test
                 avx_ChirpData_d 0.004140 0.00000  test
                 avx_ChirpData_e 0.003956 0.00000  test
                 avx_ChirpData_f 0.003741 0.00000  test
                 avx_ChirpData_g 0.003686 0.00000  test
                 avx_ChirpData_h 0.004516 0.00000  test
                 avx_ChirpData_i 0.003902 0.00000  test
            avx_fma4_ChirpData_a 0.003376 0.00000  test
            avx_fma4_ChirpData_b 0.003804 0.00000  test
            avx_fma4_ChirpData_c 0.003685 0.00000  test
            avx_fma4_ChirpData_a 0.003376 0.00000  choice

                   Test duration    10.54 seconds

Ftst_v7 completed successfully.


i3-2120
BOINC Running on 560

=========================================================
Ftst_v7_J52_Chirponly started.

Optimal function choices:
--------------------------------------------------------
                            name   timing   error
--------------------------------------------------------
                     v_ChirpData 0.005332 0.00000  test
                   fpu_ChirpData 0.012461 0.00000  test
               fpu_opt_ChirpData 0.004724 0.00000  test
             sse1_ChirpData_ak8e 0.005928 0.00000  test
              sse2_ChirpData_ak8 0.004362 0.00000  test
              sse3_ChirpData_ak8 0.004210 0.00000  test
                 avx_ChirpData_a 0.002198 0.00000  test
                 avx_ChirpData_b 0.002080 0.00000  test
                 avx_ChirpData_c 0.002259 0.00000  test
                 avx_ChirpData_d 0.002050 0.00000  test
                 avx_ChirpData_e 0.002061 0.00000  test
                 avx_ChirpData_f 0.002186 0.00000  test
                 avx_ChirpData_g 0.002199 0.00000  test
                 avx_ChirpData_h 0.002787 0.00000  test
                 avx_ChirpData_i 0.002355 0.00000  test
            avx_fma4_ChirpData_a not supported by system
            avx_fma4_ChirpData_b not supported by system
            avx_fma4_ChirpData_c not supported by system
                 avx_ChirpData_d 0.002050 0.00000  choice

            Second run

                     v_ChirpData 0.004999 0.00000  test
                   fpu_ChirpData 0.012899 0.00000  test
               fpu_opt_ChirpData 0.004722 0.00000  test
             sse1_ChirpData_ak8e 0.005912 0.00000  test
              sse2_ChirpData_ak8 0.004414 0.00000  test
              sse3_ChirpData_ak8 0.004065 0.00000  test
                 avx_ChirpData_a 0.002204 0.00000  test
                 avx_ChirpData_b 0.002195 0.00000  test
                 avx_ChirpData_c 0.002226 0.00000  test
                 avx_ChirpData_d 0.002059 0.00000  test
                 avx_ChirpData_e 0.002055 0.00000  test
                 avx_ChirpData_f 0.002176 0.00000  test
                 avx_ChirpData_g 0.002093 0.00000  test
                 avx_ChirpData_h 0.002694 0.00000  test
                 avx_ChirpData_i 0.002245 0.00000  test
            avx_fma4_ChirpData_a not supported by system
            avx_fma4_ChirpData_b not supported by system
            avx_fma4_ChirpData_c not supported by system
                 avx_ChirpData_e 0.002055 0.00000  choice

            Third run

                     v_ChirpData 0.004695 0.00000  test
                   fpu_ChirpData 0.012390 0.00000  test
               fpu_opt_ChirpData 0.004516 0.00000  test
             sse1_ChirpData_ak8e 0.005742 0.00000  test
              sse2_ChirpData_ak8 0.004219 0.00000  test
              sse3_ChirpData_ak8 0.004038 0.00000  test
                 avx_ChirpData_a 0.002096 0.00000  test
                 avx_ChirpData_b 0.002074 0.00000  test
                 avx_ChirpData_c 0.002121 0.00000  test
                 avx_ChirpData_d 0.001955 0.00000  test
                 avx_ChirpData_e 0.001953 0.00000  test
                 avx_ChirpData_f 0.002074 0.00000  test
                 avx_ChirpData_g 0.002091 0.00000  test
                 avx_ChirpData_h 0.002691 0.00000  test
                 avx_ChirpData_i 0.002248 0.00000  test
            avx_fma4_ChirpData_a not supported by system
            avx_fma4_ChirpData_b not supported by system
            avx_fma4_ChirpData_c not supported by system
                 avx_ChirpData_e 0.001953 0.00000  choice

                   Test duration     8.12 seconds

Ftst_v7 completed successfully.

Quote from: Claggy on May 17, 2012, 10:36:04 PM
Have you seen that FFTW 3.3.2 has been released?

Claggy
Won't be a big player from our viewpoint.  Some others & myself had forwarded some improvements to make AVX work with other compilers to Matteo Frigo (Author of FFTW).  For most of our purposes IPP will perform better ( i.e. as with Heinz' ATOM experiments, in a more formalised & general way) , though the option of including polished fftw will tend to take some of the strain out of older AMD & Pre-SSE2 chips, where Intel's Integrated performance primitives are lacking.  Small market, but important for completion, which cuts down user support, which is the biggest cost.

Jason
It is not the strongest of the species that survives, nor the most intelligent that survives. It is the one that is the most adaptable to change.
Charles Darwin
---
Chaos: When the present determines the future, but the approximate present does not approximately determine the future.
Edward Lorenz

Powered by EzPortal