Crunchers Anonymous

Cafe => General Discussion => Topic started by: Richard Haselgrove on August 16, 2013, 02:07:29 AM

Title: Hardware question on cooling
Post by: Richard Haselgrove on August 16, 2013, 02:07:29 AM
Anyone mind if I ask for suggestions on this problem, please?

I mentioned recently that I was suffering from 'desktop freezing' on my big machine (about a year old this weekend): I see BOINC failing to update on my BoincView remote, and when I look at the machine locally, I see the Windows desktop (intact without artefacts or error messages), but the system clock not updated in half an hour or whatever.

I built the machine with a single GPU (GTX 670), but added a second identical one. I think I've narrowed the problem down to the first GPU overheating.

The cards are "Gainward Phantom" models. They have a full-size heatsink with three heat pipes, and two enclosed fans which suck in air from underneath, and exhaust it through the rear of the case through a slotted mounting bracket.

The GPUs are about 2.5 slots thick (with a double mounting bracket), and the notherboard has the PCIe connectors three case-slots apart. So the upper GPU only has about half a slot's worth of space to get cool air to its fan intakes - and that air passes over the exposed PCB surface of the lower GPU - so it'll heat up a lot on the way.

I have plenty of space in a big HAF case, and plenty of fans oriented correctly to cool the CPU - but how can I best get extra cool air to the upper GPU?

I'll try to take a photograph and mark the existing airflows, while I test the theory by running with a single GPU for a while.

Any ideas welcome, but preferably not ones which require advanced machining/fabrication skills!
Title: Re: Hardware question on cooling
Post by: Richard Haselgrove on August 16, 2013, 02:38:23 AM
OK, here's the beast.

(http://i1148.photobucket.com/albums/o562/R_Haselgrove/DSC00824.jpg)

Main fans and airflow (I did the 'tissue test'):

Huge intake fan at bottom right, sucks room air in through the front panel.
CPU fans both blow towards the left, as does a rear case fan level with the CPU.
There's an upper case fan above the CPU, blowing CPU heat up through the roof.
The PSU has its own intake grille underneath, and exhausts to the left through the back panel. Shouldn't be any heat escape from the PSU into the case.

The current GPU is in position 2 - This is the one I've removed from the upper slot, next to the CPU.

(http://i1148.photobucket.com/albums/o562/R_Haselgrove/DSC00825.jpg)

(http://i1148.photobucket.com/albums/o562/R_Haselgrove/DSC00826.jpg)
Title: Re: Hardware question on cooling
Post by: arkayn on August 16, 2013, 02:40:38 AM
I have 2 fans located in my side panel that put additional air directly onto the GPU's and CPU.
Title: Re: Hardware question on cooling
Post by: Richard Haselgrove on August 16, 2013, 02:52:31 AM
Quote from: arkayn on August 16, 2013, 02:40:38 AM
I have 2 fans located in my side panel that put additional air directly onto the GPU's and CPU.

Ah - good point. My side panel is drilled and grilled, too - 2x150 mm, or 1x180mm, if I've got my measurements right.

Another possibility is a second fan in the roof, at the front blowing down to set up a clockwise circulation. Just have to remember not to put a pile of papers down on the top of the box...
Title: Re: Hardware question on cooling
Post by: Jason G on August 16, 2013, 03:16:27 AM
My mother's machine had some similar freezes when I first replaced the dead motherboard (Both Gigabyte brand, midrange, Z77 replacing Z68 chipset).  The replacement would hard freeze as described about once every 3-4 days.  I had temperature monitors in the tray too, so could see it wasn't temps in my case.

What got rid of it completely ( been no freezes for a couple of months) was some combination of the following:
- Moherboard BIOS update & reset BIOS to defaults.
- Force update Intel Chipset drivers, checking dates & versions on each Intel(r) device really does update, because installing the inf isn't enough.
Get the zip form, unzipit & point the driver updatesto the All folder
- Updated video driver for the sake of it. (unlikely culprit though)

probably a few more minor things in there, like updating some other drivers.
Title: Re: Hardware question on cooling
Post by: Richard Haselgrove on August 16, 2013, 03:30:10 AM
Thanks - mine's a Gigabyte Z77 too, as you probably saw. I got the latest BIOS when I built the machine, but there might have been an update since then - I'll check.

The thing that led me to question an overheated GPU was a succession of freezes, and finally this morning discovering that GPU 0 was running, but had downclocked (running 2x x41zc/cuda50). I cleared that, but went on with my planned experiment of re-trying GPUGrid, which puts a much more continuous strain on the GPUs - something like 8 hours continuous, without even a break between tasks. It froze again within five minutes.

When I only had one card in the machine (and again this afternoon while I only have one for testing), it seems to run fine. I've also tried moving it into a potentially cooler location, but I think the weather has conspired against that.
Title: Re: Hardware question on cooling
Post by: Jason G on August 16, 2013, 03:36:43 AM
Yeah, temps are still definitely a first likely suspect, though turned out a red herring in my case. Setting eVGA precision to show theGPU temp in the tray confirmed it wa cool running.

If the temps issue doesn't turn out to be the culprit, then next most likely is specifically the PCI express drivers component of the Intel Chipset inf update utility. Forcing that to update is a bit of a dance, but ultimately dropped  DPC latencies as well, implying improved driver quality.

Alll my 'system devices' marked Intel (R) now show:
Driver Date: 9/07/2013
Driver version:  9.1.9.1004

except for one legacy PCI bridge,that uses an MS driver.
Title: Re: Hardware question on cooling
Post by: Richard Haselgrove on August 16, 2013, 03:42:20 AM
I was running TThrottle - for monitoring, rather than control. Usually showed GPU 0 was 15 - 20*C warmer than GPU 1, though I thought still within tolerance.
Title: Re: Hardware question on cooling
Post by: Jason G on August 16, 2013, 03:44:46 AM
Quote from: Richard Haselgrove on August 16, 2013, 03:42:20 AM
I was running TThrottle - for monitoring, rather than control. Usually showed GPU 0 was 15 - 20*C warmer than GPU 1, though I thought still within tolerance.

< 70 degres C ? or substantially warmer ?
Title: Re: Hardware question on cooling
Post by: Richard Haselgrove on August 16, 2013, 04:23:07 AM
The single GPU running GPUGrid now (with the case side panel off, so air flow not so well guided) is showing 63.0*C on TThrottle. Form memory, the previous readings were well into the 70s for GPU 0, high 50s for GPU 1

What fans would you suggest for the side panel? Manual confirms ready for 2x 140mm, or 1x 200mm (I think I'd go for 2x140). I think I'm out of motherboard power points, so they would need a Molex connector or adapter - do most manufacturers include those? (some mention them, most don't). I'd want to preserve the current - very quiet - running: that was another reason for the move, I literally couldn't hear it running in the workroom with other computers. And I can still hardly hear it in a room by itself!
Title: Re: Hardware question on cooling
Post by: Jason G on August 16, 2013, 07:04:27 AM
Yeah, 70 degres C is where the turboboost mechanism takes over, fiddling with clocks, so for max  output you want to stay under that. (working with it instead of against it)

I personally like Noctua Fans, because they are quiet (terrible colour though), but any decent 120-140mm choice should really be  pretty quiet unless going for ridiculuous CFM / rpm models.

For numbers of bigfans I'd avoid using the mobo power altogether,  just atad easier on the mobo voltage regulators, which tendto be the first thing to fail (if anything) these days.  Depending on fan models/brands they come with some sortof molex adaptors... at least my last 120mm & 140mm noctua ones did.   Find a good picture of box contents & they should a clear picture showing the screws/adaptors & optional silicon shock mount alternative things.

Title: Re: Hardware question on cooling
Post by: Richard Haselgrove on August 16, 2013, 06:45:30 PM
It's a cooler day today, so the solo card is showing 57*C. Once the coffee's kicked in, I'll put the second card back in, and - assuming I haven't broken it - take some more readings, mess about with the airflow, try an external room fan - you know the sort of thing.

Local supplier carries Noctua, but doesn't have any true 140mm in stock - and I don't like the colour, either. I'll look around, following you tips - ta.

Later - put the original card back in, and the temps are around 78*C / 58*C (open case). I tried sliding an air baffle (sheet of corrugated cardboard) between the two cards, to prevent the heat from the lower one reaching the upper one - but the upper temp went to 82*C and rising, so I pulled it out quickly. I think the obstructed air flow made things worse instead of better.

So, extra fans it is, then. Supplier has a brace of http://www.corsair.com/cpu-cooling-kits/air-series-fans/air-series-af140-quiet-edition-high-airflow-140mm-fan.html in stock, and a cable adapter - fetching now. I'll let you know how I get on.

(oh, and the motherboard does have another full-length PCIe slot, and the case would take a double-width card there - but I'd be obstructing audio and USB connectors, and the second card would be drawing in air from very close to the PSU - not worth it just to get a better airflow gap for the upper card. And the slot is only wired x4, anyway)
Title: Re: Hardware question on cooling
Post by: Jason G on August 17, 2013, 12:54:31 AM
Yeah, I've heard also a rubber stick on foot or similar increasing the separation a tiny bit might help too, depending how much you can stretch things with those massive custom heatsinks.

[Later:] looking again at the pics, I'm inclined to beleive you might be under a vacuum as well, as opposed to the desirable positive pressure situation.  There are the GPUs, rear fan & PSU all exhausting, and one intake at the front (?).  Adding more intake to the point where it overtakes exhaust should help quite a bit, since less air mass moves when rarified, and a denser mass of air will carry more heat.
Title: Re: Hardware question on cooling
Post by: Richard Haselgrove on August 17, 2013, 02:05:11 AM
Well, she's been running for a couple of hours now with those two extra side-panel fans, both blowing inwards to create overpressure. The key temperature has dropped a couple of degrees - the GPU cards themselves have active fan control, and the fan speeds have dropped too, so there's probably more headroom than those two degrees imply.

I doubt there was ever anything approaching vacuum - the case leaks like a sieve (literally - the only solid wall is the side panel behind the motherboard). There's a 200mm fan in the lower front panel (bottom right of photograph), blowing air over the hard disk cage and into the case. We can leave the PSU out of the pressure equation, because it takes inlet air direct from the room (underneath - outside the case), and exhausts to the rear (again outside the case).

I think my next task is to tie back those four 6-pin GPU power cables, so they can't obstruct the air flow.
Title: Re: Hardware question on cooling
Post by: Jason G on August 17, 2013, 05:14:21 AM
Sounds good :) yeah the fans dropping's a good sign.  Personally I'd make a custom fan profile (in evgaprecision or equivalent) slightly more aggressive than the default, due to knowing there are two stacked cards. Depends on how much air rushing sound you can tolerate though.  Sustaining an overclock on my (single) 480 sometimes gave the feeling of sitting in a learjet taking off, while the 680 is near ambient noise levels at all times.  With evga's custom cooler on the 780, at stock frequencies, I do hear some slight white noise, but really only if I stay still enough to focus on it.
Title: Re: Hardware question on cooling
Post by: arkayn on August 17, 2013, 08:38:57 AM
I think I have at least 2 slots between my 2 cards, the 660 runs at 58 and the 670 is around 63.
Title: Re: Hardware question on cooling
Post by: Richard Haselgrove on September 13, 2013, 03:36:31 AM
OK, so it's started doing it again - lockup with static system clock, no response to mouse/keyboard, no BOINC activity on remote BoincView or Manager. I don't think it's heat this time - weather is getting cooler, and the 'frozen' desktop display this time showed TThrottle's display of the heat-prone GPU at 75*, the cooler one 55*-ish.

Been doing some error-recovery app testing at GPUGrid recently - no lockup problems during that. Tried switching back to production-mode, but one of those failed so horribly this morning - multiple driver restarts on reboot, ending up with a black screen and repeated system boot beeps - that I suspended that task in safe mode, and aborted it when I got the machine back up. Since then, been running SETI only (x41zc) on the GPUs - but it still happened again. Before that, I got three of those false 'bad workunit header' job errors - http://setiathome.berkeley.edu/results.php?hostid=7070962&state=6 - though that didn't kill either BOINC or Windows.

Host has SSD boot drive, and two sata HDs as Raid 1 mirrored data drive, using the Intel motherboard Raid controller and drivers. Every time the host crashes, Intel schedules a disk consistency check, and I have a *slight* suspicion the next crash is correlated with Intel reporting that the consistency check was successful, and no errors were found. Nevertheless, usually one or two tasks rewind to zero %age progress (but significant elapsed time), suggesting that the checkpoint files couldn't be read after the crash.

The other significant factor maybe that I installed the September windows security updates last night, and the problems (re-)started after that. These are the ones I installed:

(http://i1148.photobucket.com/albums/o562/R_Haselgrove/MSSeptember2013updates.png)

Note that Office 2007 updates 2760411 and 2760588 were (successfully) installed twice each - Microsoft have acknowledged this to be a known problem, and they're working on it. But is anything else in that list known (or likely) to have started causing stability problems)?
Title: Re: Hardware question on cooling
Post by: Claggy on September 13, 2013, 04:05:59 AM
As it happens my i7-2600K has been having Stability problems (again) too, my system eithier locks up totally, mouse doesn't move, etc,
then Blue Screens with a Hardware problem, or a 'A clock interrupt was not received on a secondary processor within the allocated time interval' Blue screen,
It normally happens if the CPU load is near or above 100Watts, cleaning the Corsair cooler generally returns it to stability, (I've just done that)
all summer I've been limiting it to only 50% of cores too, at the moment it's doing CPU Astropulse, that nudges the CPU power usage up to 100 Watts through.
(Possible solutions for me, a Bios update (but i'm locked into this Bios if I want to keep my overclock), better PSU (I doubt it), better M/B VRM cooling (I doubt it),
already tried a Better Corsair cooler, not made any real difference)

Claggy
Title: Re: Hardware question on cooling
Post by: Richard Haselgrove on September 13, 2013, 04:17:45 AM
I'm running 6 cores out of 8, and TThrotle temps are 75 - 78 - 75 - 68 as I type. Don't have an easy input power monitor to hand for CPU only - she's drawing 440W from the wall, on a 1200AX PSU Where did you see 100W?

CPU-Z thinks my max should be 77W:

(http://i1148.photobucket.com/albums/o562/R_Haselgrove/i7CPU-Z.png)
Title: Re: Hardware question on cooling
Post by: Claggy on September 13, 2013, 04:24:36 AM
Quote from: Richard Haselgrove on September 13, 2013, 04:17:45 AM
I'm running 6 cores out of 8, and TThrotle temps are 75 - 78 - 75 - 68 as I type. Don't have an easy input power monitor to hand for CPU only - she's drawing 440W from the wall, on a 1200AX PSU Where did you see 100W?
CPUID Hardware Monitor has Power Monitoring on my M/B, with two AP and one Seti v7, Package is at ~91Watts, IA Cores at ~85Watts, GT at 0.17Watts, UnCore at ~5.5Watts.

CPU-Z reports Max TDP as 95Watts here.

SIV also displays the same values on the bottom right as a pop up.

Claggy
Title: Re: Hardware question on cooling
Post by: Jason G on September 13, 2013, 04:29:15 AM
Hmmm, in both cases sounds familiar. Maybe reseating power connectorson the motherboard repeatedly (to knock off any oxidation) and applying some contact lubricant.
Title: Re: Hardware question on cooling
Post by: Richard Haselgrove on September 13, 2013, 04:31:48 AM
Quote from: Claggy on September 13, 2013, 04:24:36 AM
Quote from: Richard Haselgrove on September 13, 2013, 04:17:45 AM
I'm running 6 cores out of 8, and TThrotle temps are 75 - 78 - 75 - 68 as I type. Don't have an easy input power monitor to hand for CPU only - she's drawing 440W from the wall, on a 1200AX PSU Where did you see 100W?
CPUID Hardware Monitor has Power Monitoring on my M/B, with two AP and one Seti v7, Package is at ~91Watts, IA Cores at ~85Watts, GT at 0.17Watts, UnCore at ~5.5Watts.

SIV also displays the same values on the bottom right as a pop up.

Claggy
OK, it works on mine too. Min 62W, max 68W, with a two-fan Noctua heatsink on top. I reckon the shop knew what they were doing when they assembled that bit for me.
Title: Re: Hardware question on cooling
Post by: Jason G on September 13, 2013, 04:34:22 AM
Off the wall.  How long now since either of you redid the thermal paste ? & what kind was it ?
Title: Re: Hardware question on cooling
Post by: Richard Haselgrove on September 13, 2013, 04:36:08 AM
Quote from: Jason G on September 13, 2013, 04:34:22 AM
Off the wall.  How long now since either of you redid the thermal paste ? & what kind was it ?

Shop-applied
15 August 2012
Not itemised on invoice  :P
Title: Re: Hardware question on cooling
Post by: Richard Haselgrove on September 13, 2013, 04:39:46 AM
RAID verification completed during my previous post (~10 minutes ago) - so it took about 1 hour 20 minutes since last reboot. She takes much longer than that to freeze, so we can rule it out - just that bloody balloon pop-up stays onscreen until the mouse moves, and if the mouse has frozen by the time I see the screen...

OK, thanks all - off out for dinner now. Will check in again when I get back, if you have any more off-the-wall ideas.
Title: Re: Hardware question on cooling
Post by: Jason G on September 13, 2013, 04:42:22 AM
Raid rebuild is a pain I know well.  Not 1 verification error in 3 years on my dev host despite several things conspiring to kill it.

Good show with the heatsink.  Probably Noctua NHT-1 then & dried out.  I'd get some isopropanol, tissues & arctic cooling MX-3.  Unseating/reseatng the processor might be an idea too, since several mobo brands decided triple plate the CPU socket pins this year, so moving thing around might shake off some skungy stuff ( indicated by inference from design changes).
Title: Re: Hardware question on cooling
Post by: Claggy on September 13, 2013, 04:46:24 AM
Quote from: Jason G on September 13, 2013, 04:34:22 AM
Off the wall.  How long now since either of you redid the thermal paste ? & what kind was it ?
I do mine fairly frequently, did it today (Arctic Silver 5), my temps are ~60Ã,°C to 65Ã,°C at the moment, H100i temperature is 31Ã,°C (I take that to mean basically the coolant temp)
My problems are probably because my system builder used the original 0501 Bios, at least three Bios's afterward say say 'Improve System stability' as well as fixed Intel Rapid Storage Technology,

Claggy
Title: Re: Hardware question on cooling
Post by: Jason G on September 13, 2013, 04:49:56 AM
Quote from: Claggy on September 13, 2013, 04:46:24 AM
Quote from: Jason G on September 13, 2013, 04:34:22 AM
Off the wall.  How long now since either of you redid the thermal paste ? & what kind was it ?
I do mine fairly frequently, did it today, my temps are ~60Ã,°C to 65Ã,°C at the moment,
My problems are probably because my system builder used the original 0501 Bios, at least three Bios's afterward say say 'Improve System stability' as well as fixed Intel Rapid Storage Technology,

Claggy

Rofl.  gotta love modern rush to market methodology
Title: Re: Hardware question on cooling
Post by: arkayn on September 13, 2013, 05:02:00 AM
I have been mostly lucky with my current builds, no instability with the CPU's just dieing PSU's and GPU's.
Title: Re: Hardware question on cooling
Post by: Jason G on September 13, 2013, 06:06:52 AM
Quote from: arkayn on September 13, 2013, 05:02:00 AM
I have been mostly lucky with my current builds, no instability with the CPU's just dieing PSU's and GPU's.
Sounds bad enough to me  ;D.  While my main dev host since the first widows AKv8 ports has been ultra awesome, everything else seems to require constant attention  ::)
Title: Re: Hardware question on cooling
Post by: Richard Haselgrove on September 13, 2013, 08:54:57 PM
Good news - Microsoft have solved their problem with the everlasting Office critical update. So I poked around the optional list...

(http://i1148.photobucket.com/albums/o562/R_Haselgrove/MSSeptember2013optionalupdates.png)

Maybe I need to summon up the courage and energy to do that Intel WDDM graphics driver for the i7 - from Intel, I suspect, rather than from Microsoft.

The platform update http://support.microsoft.com/kb/2670838 also sounds significant, but I suspect more so for developers, not end users like me. Observations?
Title: Re: Hardware question on cooling
Post by: Jason G on September 14, 2013, 02:10:04 AM
Yeah it's a pretty substantial update underneath. It brings DirectX up to spec along the path of those WDDM developments, which involves applications sharing GPUs in a more balanced & secure way.  Works well (here anyway).  Cuda uses directX calls underneath extensively  for its blocking synchronisation (which openCL 1.1 doesn't have directly, on purpose, as we explored). 

Ironically it turns out the OpenGL equivalent we looked at is more efficient (according to nVidia Engineers), so I'll be eventually (within x42 series) using Cuda in non-blocking form & using custom sychronisation via either DirectX (Windows only) or adapted OpenGL calls (multiplatform with cuda and OpenCL). 

Since so much is still fluid and subject to ongoing change, I'd like to have all the variations handy, in library form, as things settle down. i.e. native Cuda using directX underneath, Cuda async with custom directX sync, Cuda async with custom OpenGL sync, OpenCL with DirectX sync and OpenCL with OpenGL sync

Title: Re: Hardware question on cooling
Post by: Richard Haselgrove on September 14, 2013, 06:57:17 PM
Quote from: Jason G on September 14, 2013, 02:10:04 AM
Yeah it's a pretty substantial update underneath....
Which one was that about - the Intel WDDM or the Platform update?

I think I'm going to have to bit the bullet and apply at least one. She froze again at 2 am, in a room with the window open and no heating anywhere in the house. I'm not buying overheating as the primary cause in this case - she ran flawlessly throughout the WOW! event in August, when it was much warmer.
Title: Re: Hardware question on cooling
Post by: Jason G on September 14, 2013, 07:03:38 PM
Quote from: Richard Haselgrove on September 14, 2013, 06:57:17 PM
Quote from: Jason G on September 14, 2013, 02:10:04 AM
Yeah it's a pretty substantial update underneath....
Which one was that about - the Intel WDDM or the Platform update?

I think I'm going to have to bit the bullet and apply at least one. She froze again at 2 am, in a room with the window open and no heating anywhere in the house. I'm not buying overheating as the primary cause in this case - she ran flawlessly throughout the WOW! event in August, when it was much warmer.

They go sortof hand in hand.  The platform update is the DirectX one, then the recent jostling of nVidia drivers fixed a few problems related to the changes (most visible with 700 series TDRs) then the Intel GPU one would be updates & fixes for that at driver & user level components.   ... note that I see integrated Radeons mentioned as having driver related issues (with Ms's platform update, no surprise)

I'd go for the platform update first, then update the one from Intel's site if you [are or] aren't using an internal or motherboard integrated Intel GPU, and [even if] it's disabled in device manager and present, you wouldn't need the Intel one I expect if completely disabled, though worth checking if that one updates any other chipset components as well.
Title: Re: Hardware question on cooling
Post by: Richard Haselgrove on September 14, 2013, 07:48:32 PM
That sounds like a plan.

1) Platform update
2) Intel WDDM - the hd4000 component is enabled, though I've never got it to work. That's another on the Round Tuit list - confirm Claggy's bugrep of the server handling of the Intel_GPU preference selection
3) Clean install of NVidia driver over the top of the previous two - there's a newer Beta available now.

I may be some time...
Title: Re: Hardware question on cooling
Post by: Jason G on September 14, 2013, 09:12:53 PM
Looks like the right order to me, and no additional caveats that I know of.  Hardware level upwards.
Title: Re: Hardware question on cooling
Post by: Richard Haselgrove on September 15, 2013, 02:37:41 AM
Well, that was fun - not.

1) Platform update - straightforward
2) Intel driver etc. - Win64_153117
Downloaded .exe file - it ran once, and threw an error part way through.
Removed the droppings via Programs control panel, then tried again (after reboot). The .exe file - before it got to the setup stage - declared that the operating system was not longer compatible.
Downloaded the .zip file, and ran setup from there - I think that's OK
3) NVidia driver - relatively routine, as I've done it previously. When the HDMI-connected screen goes completely black, nip downstairs and run the rest of the installation over the handy VNC link...
Got lumbered with dot net four again, and the 150 MB of updates or whatever it was (11 of them...)

Connected up a DVI cable in parallel with the HDMI, but to the Intel port. BOINC now sees that I have a 4000 HD - but only if I extend the Windows desktop as well, which is unhelpful. More work to be done on that one.

Anyway, take a peek at http://setiathome.berkeley.edu/show_host_detail.php?hostid=7070962 - I think that looks OK, even with OpenCL: 1.01 for NVidia, and OpenCL: 1.02 for Intel. I'll let her run for a while in normal configuration to check for crashes, then maybe open her up to some Intel work from Albert or somewhere.
Title: Re: Hardware question on cooling
Post by: Jason G on September 15, 2013, 04:14:39 AM
Haha.  It'll be interesting to see how we end up making best use of mixes like that.  Some prettified regression tests will be on the agenda soon, mixing multithread & multi-GPU. I'd be very surprised if the optimal ended up being one heavyweight process per device, or single multithread processes either, but I suspect instead somewhere in between dependant on cache architectures & specific device characteristics. Working out how to model that might call for some sophisticated experimentation.
Title: Re: Hardware question on cooling
Post by: Richard Haselgrove on September 16, 2013, 08:37:18 PM
Just to finish off - since I got the driver updates installed on Sunday afternoon, I've had no more lock-ups. I ran pure SETI for the first 24 hours - that counts as 'light duties' for a big Kepler. I restarted the 'short' GPUGrid Beta tasks - about 150 minutes full-power - yesterday evening, and they have all run without even triggering Harvey's new ' error recovery via temporary exit' code. I'll switch back to the high-stress 'long' tasks - about 10 hours - when the last Beta finishes (due within the hour).
Title: Re: Hardware question on cooling
Post by: Jason G on September 16, 2013, 08:42:02 PM
Good news  ;).  yeah validating Microsoft's forward looking WDDM i found impressive.  I guess billions of dollars might help solve some problems after all...
Title: Re: Hardware question on cooling
Post by: Richard Haselgrove on September 17, 2013, 04:16:59 AM
And now I've taken my life in my hands, and opened up the Intel GPU to crunch for Einstein. Doing the cut-down tasks they issue for this hardware in ~12 minutes - that's equivalent to doing the normal GPU tasks (pack of 16) in something over three hours. My 9800GT takes about two hours, so the HD 4000 is about 60% - 65% of a 9800GT.

HWMonitor is saying that the 'GT' component of the i7 is drawing about 11W, which has taken me a tad over rated TDP - I hope the Noctua can keep up. But not an output/power ratio to be sneezed at.
Title: Re: Hardware question on cooling
Post by: Jason G on September 17, 2013, 04:28:15 AM
mmm, sounds not too bad efficiency-wise