| Thermal Management and Reliability: Heat Sinks |
Determining the Effects on a Reliability Prediction
Introduction
It is no secret that the operating temperature of a device significantly affects its
reliability. Active electronic components without proper thermal management can see immediate damage or
reduced reliability due to phenomena such as electromigration. A device's ability to dissipate heat is
affected by package design, proximity to other devices in the system, the surrounding airflow, and circuit
board and trace properties. Design methods can manage these factors to augment heat transfer away from the
device, but many cases require additional conductive, convective, or radiative paths to cool the device to
acceptable levels.
Heat sinks are commonly used to provide this additional cooling path. They have a direct impact
on system reliability, and they should be accounted for in reliability predictions. However, simply assigning
a failure rate to a heat sink is not generally the best solution for determining its impact. A better approach
is to consider its effects on the thermal properties of the device to which it is mounted. This article will
explore how to determine the effects of a heat sink on a reliability prediction. It also provides an example
to examine the extent to which heat sinks affect the reliability performance of a device and thus your overall
return on investment.
Temperature Calculations of Power-Dissipating Devices
As an electronic device dissipates power, it heats up. The heat must be expelled to the
environment; otherwise, the device will eventually be subjected to temperatures over its maximum temperature
rating. When this occurs, the device may either begin to operate out of tolerance limits or cease operating
altogether. Heat escapes from the device in all directions, mainly through the mechanisms of convection to
the air above it and conduction to the circuit board to which it is attached.
A first-order model used to assess the cooling efficiency of a device assumes one-dimensional
convective cooling to the ambient air above the device. The accuracy of this model is beyond the scope of
this article, but its application can prove very useful for heat sink sizing as well as reliability
calculations. For this one-dimensional approximation, the complete thermal path can be characterized by the
junction-ambient thermal resistance, θJA, which is the thermal resistance
between the die and the surrounding air. In this case, the junction temperature (temperature at the
die), TJ, can be computed by:
(1)
| TA |
= | Ambient temperature |
| PD |
= | Power dissipated in the device |
With assumptions of the ambient conditions, θJA is usually provided
in the device's data sheet. For the case of a device with a heat sink, θJA becomes:
(2)
| θJC |
= | Thermal resistance between the die and the package case |
| θCS |
= | Thermal resistance between the case and the heat sink |
| θSA |
= | Thermal resistance between the heat sink and ambient |
θJC and θSA are given in the device and heat
sink data sheets, respectively. Typically, θCS is so small that the thermal
resistance of the heat sink adhesive is used.
Reliability in Thermal Design
Most reliability prediction models account for thermal stress in failure rate calculations
using a multiplicative factor, πT , which is a function of the device junction
temperature. The Relex Reliability Prediction module provides a
very versatile interface for specifying πT in the different prediction models.
One option, available for several models including MIL-HDBK-217, calculates the junction temperature
automatically. In this case, the ambient operating temperature would be set at the assembly level on
the Calculation Data tab, and the operating power and calculated junction-ambient
thermal resistance would be entered for each part on the Prediction Data tab. This approach is
preferred for assessing thermal properties of many components in an assembly. Another option is to simply
enter the calculated junction temperature in the "Junction Temp Override" field on the Prediction
Data tab for the device.
As an example, suppose you want to analyze an integrated circuit in a 20-pin SOIC package.
The IC is to be placed in a system that will operate between 20oC and 80oC with
natural convective cooling. The IC's specifications in the system are as follows:
| Nomenclature |
Description |
Value |
Unit |
| TJ, MAX |
Maximum rated junction temperature |
150 |
oC |
| θJC |
Junction-to-case thermal resistance |
18 |
oC/W |
| θJA |
Junction-to-ambient thermal resistance (natural convection) |
76 |
oC/W |
| PD, MAX |
Maximum power dissipation during operation |
1.3 |
W |
To determine if the IC is sufficiently cooled without a heat sink, you would calculate the
junction temperature of the device under worst-case conditions using equation (1):



For this scenario, the device will operate with a junction temperature that is well beyond
its maximum rated value. Additional cooling, such as a heat sink, must be considered. To determine the
maximum allowed (or minimum required) thermal resistance requirement for the heat sink, you would place
equation (2) into equation (1) and rearrange to obtain the inequality:

Assuming the thermal resistance of the heat sink adhesive is 0.7oC/W, you
obtain the following:


Device manufacturers recommend placing a 10 to 15% guard band on the thermal resistance
requirement to ensure adequate cooling. Applying a 15% margin, the required heat sink thermal resistance
becomes:

By adding a heat sink with this thermal resistance rating to the design, the final
junction-ambient thermal resistance of the device becomes:

With the heat sink, the junction temperature of the device under worst-case conditions is
now below its maximum rated value:

Next, suppose you want to determine the effects of utilizing an enhanced heat sink with a
thermal resistance rating of 15 oC/W in order to conduct a cost vs. benefit trade-off. For the
enhanced configuration, you have:


Using Relex Reliability Prediction, you can examine the reliability effects of such a design
change according to MIL-HDBK-217. You would set the top-level assembly temperature to 80oC. The
device parameters entered on the Prediction Data tab for the IC are shown in the following figure.
Prediction Data Tab with Pi Factors Shown
You can then perform calculations using the three different scenarios. The following table
illustrates the differences between the three cases:
| Configuration |
θJA (oC/W) |
TJ,MAX (oC) |
Failure Rate (FPMH) |
Reliability (100 hours) |
| No heat sink |
76 |
179 |
55.288 |
0.9945 |
| Normal heat sink |
48.5 |
143 |
13.176 |
0.9987 |
| Enhanced heat sink |
33.7 |
124 |
5.506 |
0.9995 |
Including the heat sink reduces the device's failure rate by more than two-thirds. Including
the enhanced heat sink decreases the failure rate further by more than one half. The reliability gains can
be converted to a return on investment (ROI) by examining the costs associated with each configuration.
Considering only material costs of initial investments and subsequent failures, a hypothetical example
follows. Assume that the device and heat sinks have the following per unit cost:
| Cost Per Unit |
| Device |
$1.50 |
| Normal heat sink |
$0.50 |
| Enhanced heat sink |
$0.75 |
You can then calculate the total cost for each configuration as a function of time as:
| Where: |
| n |
= | Number of units |
| λ |
= | Failure rate |
| CD |
= | Device cost |
| CS |
= | Heat sink cost |
The following graph compares the ROI for the three cases. Note that although the case without
a heat sink is included for the sake of comparison, it would most likely not be considered because the
device will not function properly in the first place due to the elevated junction temperature. The ROI of
the enhanced heat sink will surpass the normal configuration after about 2½ years of operation. After
5 years of operation, the ROI will be $250 per 1000 units. It is important to note that these costs only
account for the material costs of the devices and heat sinks. Considering other underlying costs involved
with device failures, such as the costs due to system downtime, logistical costs, and repair personnel costs,
will increase your ROI significantly.
ROI for Heat Sink Configurations
Conclusion
This article has touched upon modeling methods for thermal design in reliability. Although it
specifically examined the heat sink, the same concepts can be applied to active convective cooling, conductive
cooling, and radiative cooling methods. Aside from assuring adequate cooling for device operation, thermal
design also has a significant impact on component reliability, and therefore overall system reliability. As
shown in the above example, the failure rate for a component can be dramatically reduced by managing the
component's thermal properties. ROI trade-offs should be conducted to determine the optimal cooling
mechanism.
Many of these trade-offs can be easily evaluated using Relex
Reliability Prediction. When performing reliability predictions, this module enables you to also perform
derating analyses at the same time. Any parts that are found to be enduring conditions beyond their
acceptable tolerances are highlighted. Information about the particular aspect or characteristics of the part
that is overstressed (temperature, voltage, etc) is then made available in the Pi Factors dialog box. For
additional information about this module and the many others in the Relex Reliability Software Suite, please
visit our web site at www.relex.com.
|