Hardware question on cooling

Started by Richard Haselgrove, August 16, 2013, 02:07:29 AM

Previous topic - Next topic
Hmmm, in both cases sounds familiar. Maybe reseating power connectorson the motherboard repeatedly (to knock off any oxidation) and applying some contact lubricant.
It is not the strongest of the species that survives, nor the most intelligent that survives. It is the one that is the most adaptable to change.
Charles Darwin
---
Chaos: When the present determines the future, but the approximate present does not approximately determine the future.
Edward Lorenz

Quote from: Claggy on September 13, 2013, 04:24:36 AM
Quote from: Richard Haselgrove on September 13, 2013, 04:17:45 AM
I'm running 6 cores out of 8, and TThrotle temps are 75 - 78 - 75 - 68 as I type. Don't have an easy input power monitor to hand for CPU only - she's drawing 440W from the wall, on a 1200AX PSU Where did you see 100W?
CPUID Hardware Monitor has Power Monitoring on my M/B, with two AP and one Seti v7, Package is at ~91Watts, IA Cores at ~85Watts, GT at 0.17Watts, UnCore at ~5.5Watts.

SIV also displays the same values on the bottom right as a pop up.

Claggy
OK, it works on mine too. Min 62W, max 68W, with a two-fan Noctua heatsink on top. I reckon the shop knew what they were doing when they assembled that bit for me.

Off the wall.  How long now since either of you redid the thermal paste ? & what kind was it ?
It is not the strongest of the species that survives, nor the most intelligent that survives. It is the one that is the most adaptable to change.
Charles Darwin
---
Chaos: When the present determines the future, but the approximate present does not approximately determine the future.
Edward Lorenz

Quote from: Jason G on September 13, 2013, 04:34:22 AM
Off the wall.  How long now since either of you redid the thermal paste ? & what kind was it ?

Shop-applied
15 August 2012
Not itemised on invoice  :P

September 13, 2013, 04:39:46 AM #24 Last Edit: September 13, 2013, 04:41:39 AM by Richard Haselgrove
RAID verification completed during my previous post (~10 minutes ago) - so it took about 1 hour 20 minutes since last reboot. She takes much longer than that to freeze, so we can rule it out - just that bloody balloon pop-up stays onscreen until the mouse moves, and if the mouse has frozen by the time I see the screen...

OK, thanks all - off out for dinner now. Will check in again when I get back, if you have any more off-the-wall ideas.

September 13, 2013, 04:42:22 AM #25 Last Edit: September 13, 2013, 04:46:51 AM by Jason G
Raid rebuild is a pain I know well.  Not 1 verification error in 3 years on my dev host despite several things conspiring to kill it.

Good show with the heatsink.  Probably Noctua NHT-1 then & dried out.  I'd get some isopropanol, tissues & arctic cooling MX-3.  Unseating/reseatng the processor might be an idea too, since several mobo brands decided triple plate the CPU socket pins this year, so moving thing around might shake off some skungy stuff ( indicated by inference from design changes).
It is not the strongest of the species that survives, nor the most intelligent that survives. It is the one that is the most adaptable to change.
Charles Darwin
---
Chaos: When the present determines the future, but the approximate present does not approximately determine the future.
Edward Lorenz

September 13, 2013, 04:46:24 AM #26 Last Edit: September 13, 2013, 04:56:30 AM by Claggy
Quote from: Jason G on September 13, 2013, 04:34:22 AM
Off the wall.  How long now since either of you redid the thermal paste ? & what kind was it ?
I do mine fairly frequently, did it today (Arctic Silver 5), my temps are ~60Ã,°C to 65Ã,°C at the moment, H100i temperature is 31Ã,°C (I take that to mean basically the coolant temp)
My problems are probably because my system builder used the original 0501 Bios, at least three Bios's afterward say say 'Improve System stability' as well as fixed Intel Rapid Storage Technology,

Claggy

Quote from: Claggy on September 13, 2013, 04:46:24 AM
Quote from: Jason G on September 13, 2013, 04:34:22 AM
Off the wall.  How long now since either of you redid the thermal paste ? & what kind was it ?
I do mine fairly frequently, did it today, my temps are ~60Ã,°C to 65Ã,°C at the moment,
My problems are probably because my system builder used the original 0501 Bios, at least three Bios's afterward say say 'Improve System stability' as well as fixed Intel Rapid Storage Technology,

Claggy

Rofl.  gotta love modern rush to market methodology
It is not the strongest of the species that survives, nor the most intelligent that survives. It is the one that is the most adaptable to change.
Charles Darwin
---
Chaos: When the present determines the future, but the approximate present does not approximately determine the future.
Edward Lorenz

I have been mostly lucky with my current builds, no instability with the CPU's just dieing PSU's and GPU's.

Quote from: arkayn on September 13, 2013, 05:02:00 AM
I have been mostly lucky with my current builds, no instability with the CPU's just dieing PSU's and GPU's.
Sounds bad enough to me  ;D.  While my main dev host since the first widows AKv8 ports has been ultra awesome, everything else seems to require constant attention  ::)
It is not the strongest of the species that survives, nor the most intelligent that survives. It is the one that is the most adaptable to change.
Charles Darwin
---
Chaos: When the present determines the future, but the approximate present does not approximately determine the future.
Edward Lorenz

Good news - Microsoft have solved their problem with the everlasting Office critical update. So I poked around the optional list...



Maybe I need to summon up the courage and energy to do that Intel WDDM graphics driver for the i7 - from Intel, I suspect, rather than from Microsoft.

The platform update http://support.microsoft.com/kb/2670838 also sounds significant, but I suspect more so for developers, not end users like me. Observations?

September 14, 2013, 02:10:04 AM #31 Last Edit: September 14, 2013, 02:11:39 AM by Jason G
Yeah it's a pretty substantial update underneath. It brings DirectX up to spec along the path of those WDDM developments, which involves applications sharing GPUs in a more balanced & secure way.  Works well (here anyway).  Cuda uses directX calls underneath extensively  for its blocking synchronisation (which openCL 1.1 doesn't have directly, on purpose, as we explored). 

Ironically it turns out the OpenGL equivalent we looked at is more efficient (according to nVidia Engineers), so I'll be eventually (within x42 series) using Cuda in non-blocking form & using custom sychronisation via either DirectX (Windows only) or adapted OpenGL calls (multiplatform with cuda and OpenCL). 

Since so much is still fluid and subject to ongoing change, I'd like to have all the variations handy, in library form, as things settle down. i.e. native Cuda using directX underneath, Cuda async with custom directX sync, Cuda async with custom OpenGL sync, OpenCL with DirectX sync and OpenCL with OpenGL sync

It is not the strongest of the species that survives, nor the most intelligent that survives. It is the one that is the most adaptable to change.
Charles Darwin
---
Chaos: When the present determines the future, but the approximate present does not approximately determine the future.
Edward Lorenz

Quote from: Jason G on September 14, 2013, 02:10:04 AM
Yeah it's a pretty substantial update underneath....
Which one was that about - the Intel WDDM or the Platform update?

I think I'm going to have to bit the bullet and apply at least one. She froze again at 2 am, in a room with the window open and no heating anywhere in the house. I'm not buying overheating as the primary cause in this case - she ran flawlessly throughout the WOW! event in August, when it was much warmer.

September 14, 2013, 07:03:38 PM #33 Last Edit: September 14, 2013, 07:11:51 PM by Jason G
Quote from: Richard Haselgrove on September 14, 2013, 06:57:17 PM
Quote from: Jason G on September 14, 2013, 02:10:04 AM
Yeah it's a pretty substantial update underneath....
Which one was that about - the Intel WDDM or the Platform update?

I think I'm going to have to bit the bullet and apply at least one. She froze again at 2 am, in a room with the window open and no heating anywhere in the house. I'm not buying overheating as the primary cause in this case - she ran flawlessly throughout the WOW! event in August, when it was much warmer.

They go sortof hand in hand.  The platform update is the DirectX one, then the recent jostling of nVidia drivers fixed a few problems related to the changes (most visible with 700 series TDRs) then the Intel GPU one would be updates & fixes for that at driver & user level components.   ... note that I see integrated Radeons mentioned as having driver related issues (with Ms's platform update, no surprise)

I'd go for the platform update first, then update the one from Intel's site if you [are or] aren't using an internal or motherboard integrated Intel GPU, and [even if] it's disabled in device manager and present, you wouldn't need the Intel one I expect if completely disabled, though worth checking if that one updates any other chipset components as well.
It is not the strongest of the species that survives, nor the most intelligent that survives. It is the one that is the most adaptable to change.
Charles Darwin
---
Chaos: When the present determines the future, but the approximate present does not approximately determine the future.
Edward Lorenz

That sounds like a plan.

1) Platform update
2) Intel WDDM - the hd4000 component is enabled, though I've never got it to work. That's another on the Round Tuit list - confirm Claggy's bugrep of the server handling of the Intel_GPU preference selection
3) Clean install of NVidia driver over the top of the previous two - there's a newer Beta available now.

I may be some time...

Looks like the right order to me, and no additional caveats that I know of.  Hardware level upwards.
It is not the strongest of the species that survives, nor the most intelligent that survives. It is the one that is the most adaptable to change.
Charles Darwin
---
Chaos: When the present determines the future, but the approximate present does not approximately determine the future.
Edward Lorenz

Well, that was fun - not.

1) Platform update - straightforward
2) Intel driver etc. - Win64_153117
Downloaded .exe file - it ran once, and threw an error part way through.
Removed the droppings via Programs control panel, then tried again (after reboot). The .exe file - before it got to the setup stage - declared that the operating system was not longer compatible.
Downloaded the .zip file, and ran setup from there - I think that's OK
3) NVidia driver - relatively routine, as I've done it previously. When the HDMI-connected screen goes completely black, nip downstairs and run the rest of the installation over the handy VNC link...
Got lumbered with dot net four again, and the 150 MB of updates or whatever it was (11 of them...)

Connected up a DVI cable in parallel with the HDMI, but to the Intel port. BOINC now sees that I have a 4000 HD - but only if I extend the Windows desktop as well, which is unhelpful. More work to be done on that one.

Anyway, take a peek at http://setiathome.berkeley.edu/show_host_detail.php?hostid=7070962 - I think that looks OK, even with OpenCL: 1.01 for NVidia, and OpenCL: 1.02 for Intel. I'll let her run for a while in normal configuration to check for crashes, then maybe open her up to some Intel work from Albert or somewhere.

Haha.  It'll be interesting to see how we end up making best use of mixes like that.  Some prettified regression tests will be on the agenda soon, mixing multithread & multi-GPU. I'd be very surprised if the optimal ended up being one heavyweight process per device, or single multithread processes either, but I suspect instead somewhere in between dependant on cache architectures & specific device characteristics. Working out how to model that might call for some sophisticated experimentation.
It is not the strongest of the species that survives, nor the most intelligent that survives. It is the one that is the most adaptable to change.
Charles Darwin
---
Chaos: When the present determines the future, but the approximate present does not approximately determine the future.
Edward Lorenz

Just to finish off - since I got the driver updates installed on Sunday afternoon, I've had no more lock-ups. I ran pure SETI for the first 24 hours - that counts as 'light duties' for a big Kepler. I restarted the 'short' GPUGrid Beta tasks - about 150 minutes full-power - yesterday evening, and they have all run without even triggering Harvey's new ' error recovery via temporary exit' code. I'll switch back to the high-stress 'long' tasks - about 10 hours - when the last Beta finishes (due within the hour).

Good news  ;).  yeah validating Microsoft's forward looking WDDM i found impressive.  I guess billions of dollars might help solve some problems after all...
It is not the strongest of the species that survives, nor the most intelligent that survives. It is the one that is the most adaptable to change.
Charles Darwin
---
Chaos: When the present determines the future, but the approximate present does not approximately determine the future.
Edward Lorenz

Powered by EzPortal