Help us improve your experience.

Let us know what you think.

Do you have time for a two-minute survey?

 
 

thermal-health-check

Syntax (Junos OS)

Syntax (Junos OS Evolved)

Hierarchy Level

Description

Enable thermal health check, and configure an action that will respond to the detection of a thermal health event such as power leakage on PTX5K, MX10K, PTX10K and QFX10K devices. The thermal health check feature monitors the PSM power output and FRU power consumption every minute. When the PSM power output exceeds the FRU power consumption by a default or configured threshold for three consecutive iterations, it assumes a thermal health event and takes an action based on user configuration.

Note:

The default threshold for QFX10002 devices is 100 W and for other devices is 600 W.

The default action is set to none. You can use the following command to shut down all PSMs when a thermal health check fails:

Note:

On a Junos OS device, you can configure only power-threshold or power-threshold-percentage at a time since they cannot coexist.

You can configure power-threshold-percent for systems that are connected to either second or third generation power supplies.

To ensure accurate operation, the thermal health check feature enables the system to shut down the PSMs with a load under 20%. You can expect a margin of error in total FRU input and PSM output power readings compared to actual hardware values.

You must enable the PSM watchdog feature along with thermal-health check to shutdown the system in case a thermal health event causes Junos to go down. Please note that PEM firmware upgrade is required for the thermal health check and PSM watchdog feature.

You can enable the fet-failure-check option to monitor a failing power supply due to a Field-effect Transistor (FET) failure and take corrective action. You can choose to shutdown a reporting PSM if a redundant power supply is available, raise an alarm and log the events when a risk of thermal event is determined.

Note:

The fet-failure-check option is supported on MX10K and PTX10K devices that run on Junos OS.

Options

fet-failure-check Enable FET failure detection, and configure an action to be taken upon FET failure.
action-onfail Choose an action to be performed on detection of a thermal health event. The following options are available:
  • auto-shutdown—The software shuts down all PSMs when a thermal health event is detected. It turn off power to the router to avoid further damage or fire.

  • none—The software raises a major alarm when a thermal health event is detected.

shutdown-timer value Set the timer in seconds to shutdown the PSMs (Range: 10sec to 15mins). This setting is necessary if the automatic shutdown did not occur during a thermal health event due to a software freeze. The default shutdown timer is 900 seconds, and you have the option to reconfigure it. Please note that if the configured timer value is shorter than the default, then you need to deactivate it before rebooting the system. The reboot delay could cause the timer to expire. The shutdown-timer option is not available on Junos OS Evolved. Instead, the timeout value configured for the PSM watchdog feature serves as the shutdown-timer.
power-threshold value Set power threshold value in watts. The default value for power-threshold is 600. Avoid using the default value for systems connected to second or third generation power supplies.
power-threshold-percent value Set power threshold value in percent of Total System Power Output (Range: 4 to 10 percent). This value indicates the percentage difference between the parameters: Total System Power Output and Total System FRU Input Power. On Junos OS, you must set a value to configure power-threshold-percentage because the system doesn't provide a default. On Junos OS Evolved, power-threshold-percent value has a default value of 8.
Table 1: Thermal health check alarms

Alarm

Description

Remedy

Severity

Thermal Check Failed: Exceeded threshold value

Appears when thermal health check fails upon exceeding the threshold value. If you have configured auto-shutdown, then the system will shut down.

If the action is set to none following a thermal health check failure, the alarm clears once the failure is resolved.

If the action is set to auto-shutdown, you must power-cycle the chassis to recover.

Major

Thermal Check did not meet conditions

Appears when the load of any active PSM is below 20%. System will shut down these PSMs, provided N+2 redundancy criteria is met. If the redundancy criteria is not met, the thermal health check feature will raise this alarm because it cannot shut down these PSMs.

On Junos OS Evolved, the PSM state becomes Spare under this condition.

Shut down the less loaded PSMs to increase the load of active PSMs.

Minor

PEM %d shutdown - Insufficient system load (Junos OS)

PSM %d shutdown - Insufficient system load (Junos OS Evolved)

Appears at PSM-level when the load of any active PSM is below 20%. System shuts down these PSMs, provided N+2 redundancy criteria is met.

No remedy required for this alarm. Maintaining a load above 20% for all PSMs is recommended to ensure accurate thermal health check feature functionality and shutdown of unnecessary PSMs.

Minor

Required Privilege Level

interface—To view this statement in the configuration.

interface-control—To add this statement to the configuration.

Release Information

Statement introduced in Junos OS Release 20.1R1.

fet-failure-check option introduced in Junos OS Release 21.2R1.