Recent posts

Pages 1 ... 4 5 6 7 8 ... 10

General Discussion / Re: Parallella: A Supercompute...

Last post by William - May 05, 2014, 09:48:43 PM

Quote from: Claggy on May 05, 2014, 09:34:13 PM
Mine have arrived.

Claggy

I was wondering if they might be taking the scenic route.

General Discussion / Re: Parallella: A Supercompute...

Last post by Claggy - May 05, 2014, 09:34:13 PM

Mine have arrived.

Claggy

General Discussion / Re: Parallella: A Supercompute...

Last post by Jason G - May 05, 2014, 02:28:11 PM

Quote from: Richard Haselgrove on May 03, 2014, 01:19:46 AM
Parallella manufacturing and distribution has reached backer number 3662 - moi.

After paying a hefty VAT and customs clearance bill, it was delivered late Saturday afternoon - on a holiday weekend. Now, anybody making any progress on apps? (Mike Hewson is having a real struggle to fit any meaningful FFT into the limited on-chip local memory, and paging to off-chip memory is so slow that it negates the speed of the parallel processing)

Basically I'm engineering the regression bench part to preceed x42 as cross platform. Progress so far is a rough GUI, for single device tests) under wxwidgets, and initial probing at suitable device kernels. Online submission and auto update facility will probably follow during alpha tests (various platforms). Building the bench will be made to be easy.

The bench part is meant to start with basic chirp, memory, FFT and other functions, accelerating refinement. That's because I'll be replacing the FFTs with a 'cascade freaky power spectrum' to strip redundant processing specifically around the FFTs. That may well help matters, as only small simple strided FFTs will be needed here.

Probably as that gets a bit further, whether some basic large FFT optimisation strategies would help can be clearer, though memory bandwidth constraints tend to always be some barrier in plain FFT because it isn't very arithmetically dense. That's why I'm looking toward compounding/Combining/tiling algorithms instead... to maximise locality and reduce the overall number of passes. Make more computations per byte fetched.

Thankfully FFT is rarely used in isolation, and breakign the black-box mentality will probably help a lot with this sortof device, as with others.

General Discussion / Re: Parallella: A Supercompute...

Last post by Claggy - May 03, 2014, 08:11:41 AM

Mine haven't reached me yet, according to DHL they are only 40 miles away now.

Claggy

General Discussion / Re: Parallella: A Supercompute...

Last post by Richard Haselgrove - May 03, 2014, 01:19:46 AM

Parallella manufacturing and distribution has reached backer number 3662 - moi.

After paying a hefty VAT and customs clearance bill, it was delivered late Saturday afternoon - on a holiday weekend. Now, anybody making any progress on apps? (Mike Hewson is having a real struggle to fit any meaningful FFT into the limited on-chip local memory, and paging to off-chip memory is so slow that it negates the speed of the parallel processing)

Questions / Re: Page faults

Last post by Darrell - April 17, 2014, 07:14:43 PM

OK, since it is known I won't worry about it anymore.

Thanks for the quick reply.

Questions / Page faults

Last post by Josef W. Segur - April 15, 2014, 02:57:59 PM

There's a discussion of those page faults from early in the Beta testing of SaH v7, http://setiweb.ssl.berkeley.edu/beta/forum_thread.php?id=1831&postid=40514 is the first post on that issue. But the short version is that it is FFTW's implementation of the type 2 DCT which is doing that. It seems to be about 128 per DCT, suggesting they have to allocate an extra 512 KB of RAM as scratch memory for each DCT. Because we're doing 519336 of those in each WU, it would certainly be nice if we could allocate a permanent array for the purpose and tell FFTW about it, but that feature isn't available.

I've been watching for an optimized open source DCT we could substitute, but haven't yet found one.
Joe

Released / Re: SSE2 vs. SSE3

Last post by Jimbocous - March 08, 2014, 06:19:54 PM

Awesome. Thanks for the quick feedback on this, Joe.
Much appreciated! That explains a lot.
For now, I think I'll just leave it as-is, at least until I have a better understanding of how it all fits together. If the performance improvement would be slight, it doesn't sound like it would make sense to move away from the stock install. If, on the other hand, I could contribute to the effort by trying it out, I'd be glad to do so.
Thanks again!
Jim ..

Released / Re: SSE2 vs. SSE3

Last post by Josef W. Segur - March 08, 2014, 05:59:35 PM

The only thing you're missing is that the 0.41 installer doesn't contain an SSE3 build. The Akv8c builds included are SSE2, SSSE3 (requires Core 2 or better), and AVX. Other builds were made, but Alpha testing at Lunatics didn't show the SSE3 build significantly better than the SSE2 build on any equipment the active testers have.

Those other builds are available at Lunatics for download, however, and it is certainly possible the SSE3 build would be better on Prescott or Pentium-D. I wouldn't expect a large improvement, and modifying the app_info.xml can be a little tricky. If you decide to try it and are unsure about anything, ask!
Joe

Released / Re: SSE2 vs. SSE3

Last post by Jimbocous - March 08, 2014, 03:31:30 PM

Gotcha. So BOINC and presumably 0.41 installer did pick it up correctly.
Reason I'm asking is that my appInfo file shows no reference to AKv8c_Bb_r1846_winx86_SSE3x.exe
but rather contains
AKv8c_Bb_r1846_winx86_SSE2x.exe

Am I missing something?
Thanks!

Pages 1 ... 4 5 6 7 8 ... 10