Deep Dive: Juniper Memory Leak

Don’t you hate it when critical network devices reboot spontaneously? Then this article “Juniper Memory Leak” is for you!

  #Deep Dive   #Juniper Networks   #Network as a Service  
Diana Stucki
+41 58 510 13 54
diana.stucki@umb.ch

Dear readers, please note that this blog is a bit older and therefore the content, insights and statements may have changed over time as products, services and technologies evolve.

In a recent Technical Bulletin from Juniper, it was pointed out that on MX, EX, or SRX platforms using the TRIO ASIC family, the MPCs or line cards may reset and reboot. This is caused by memory exhaustion. Junos OS versions 18.3 and newer are not affected, but if you are in the affected category, Juniper Networks is providing you with more details and solutions. So if you still want to finish your day positively, keep reading!

 

Root cause

A memory leak is a type of resource leak that occurs when a computer program incorrectly manages memory allocations in such a way that memory which is no longer needed is not released. Two Juniper Problem Reports (PR1241973 and PR1216300) are introducing a logging functionality to track memory events. This helps to track down excessive memory usage or memory which is not properly freed. However, there is no limit to the amount of information and if one part of the information is related to routes’ next-hop activities, in a network with constant route changes, line cards might run out of memory and reset.

 

Affected Systems

  • MX240, MX480, MX960, MX2020s Trio-Based MPCs – all parts start with “MPC” prefix
  • MX80, MX104, MX204, MX10003/8/16
  • EX9200 line cards
  • SRX4600 and SRX5000 series

 

Signature, workaround, and solution:

The following syslog entries are some examples of memory exhaustion:

 

Sep 4 03:56:24 re0-mx fpc14 IFRT: 'IFD get B chip stats' (opcode 53) failed
Sep 4 03:56:24 re0-mx kernel: if_pfe: Error 6 (No Memory) on IF command 53 
(IFD bchip stats)

You can monitor FPC's heap memory usage with the following command 
to see if the amount of memory is increasing.

pfe-cli> show heap 0

 

Once the memory reaches 90%, you should prepare to restart the line cards to free up memory before they reset. This is however only a workaround; software fixes are available. If you are affected, you should upgrade the software on your device. In the long run, that would be the optimal solution.

Below is a list with the affected Software Release and Junos versions that contain a fix:

  • 16.1R3-S4 or later Service Releases
  • 16.1R5, 16.1R6, 16.1R7 – fixed available in 16.1R7-S3 or later
  • 16.2R1-S4 or later – no fix available – EOL since Mai 2020 go to 17.3R3-S2 (EOL Feb. 2022) or later
  • 16.2R2 or later – no fix available – EOL since Mai 2020 go to 17.3R3-S2 (EOL Feb. 2022) or later
  • 17.1R1, 17.1R2, 17.1R3  – no fix available- EOL since March 2020, go to 17.3R3-S2 (EOL Feb. 2022) or later
  • 17.2R1, 17.2R2, 17.2R3 – fixed in 17.2R3-S3 or later
  • 17.2X75 – fixed in 17.2X75-D92, -D101, -D102, and -D110
  • 17.3R1, 17.3R2, 17.3R3 – fixed in 17.3R3-S2  or later
  • 17.4R1, 17.4R2 – fixed in 17.4R1-S7 or later, 17.4R2-S1 or later, and 17.4R3
  • 18.1R1, 18.1R2, 18.1R3 – fixed in 18.1R3-S1 or later
  • 18.2R1 – fixed in 18.2R1-S4 or later, 18.2R2 or later
  • 18.2X75 or later – fixed in 18.2X75-D11, D23, D30

 

JTAC recommended releases can be found here and Junos OS can be downloaded from here.

You can, of course, contact JTAC if you don’t see a suitable software version for your situation or if you are one of our customers, simply open a ticket on our support platform and we will be glad to help you.

Information is power, and now you have the power to avoid this issue (Juniper Memory Leak) and enjoy the rest of the day!