Deep Dive: Junos upgrade: filesystem is full part 2

We have some new developments to the issues regarding the upgrade of EX2300 and EX3400 devices.

  #Network as a Service   #Deep Dive   #Juniper Networks  
Maciej Filipczak
+41 58 510 13 57
maciej.filipczak@umb.ch

Dear readers, please note that this blog is a bit older and therefore the content, insights and statements may have changed over time as products, services and technologies evolve.

 

Before you start reading and scrolling, make sure you read “Deep Dive: Junos upgrade: filesystem is full part 1” already.

We spent additional time on research to be able to address said issues. We would like to emphasize the fact that most of the described methods here are part of Juniper best practices on doing a Junos upgrade which we are pretty sure there are no KB entries on that (yet?). Since they are mostly aiming to get the job done and to finish the upgrade. They should be used as the last resort, when we cannot use the official approach. It is possible to break current Junos installation in the process. It won’t break your device, but may require your physical presence near it to recover if the procedure fails, since it would require either power cycling the device or doing a clean USB reinstall.

 

Virtual chassis corner case

There is one interesting corner case scenario while doing an upgrade of virtual chassis on EX2300 and EX3400. When the upgrade is performed on the master RE, Junos packages are duplicated as mchassis-install.tgz and should be pushed to the other members, so that it can be used to install appropriate Junos image on them. This file would always be placed (and looked for by install script) in the /var/tmp folder. With that, if we don’t have additional space therein on all members, the upgrade will, as a result, fail. The needed space is approximately twice the size of the target image (mchassis-install.tgz file and space to extract its content).

 

Error: not enough space to unpack /var/tmp/mchassis-install.tgz

 

Since the file has been already pushed to /var/tmp/, it means that we have some additional space to perform the upgrade – just not twice as much.  To overcome this minor setback, we can use and modify the method that is described in part 1, to leverage tmpfs to store the image. The only difference is that we are actually working now with VC, which boils down to the approach to do an upgrade of each member separately from its local tmpfs, but do not include reboot option during installation process! Why? It’s because we will micro-segment VC as we hit the Junos version mismatch and offending members would be set to an inactive state. In other words, we will upgrade each member one by one, verify that all of them have desired pending release (show version | match “fpc|pending”) and then reload all members at the same time. This does not sound too dangerous right? But why is there advice to use it as a last resort method? This is because it can break your VC and in such case, you will need to invest some time to recover it if for some reason any of members (or master) will not boot with the new version. However, there can also be more things that may go wrong, such as one member fails to upgrade, sudden power failure during the upgrade, sun flare, criticality incident, and so on. 

Sometimes space is not the problem, but rather the mchassis-install.tgz file has not been properly pushed to all members if we see the error below:

 

/usr/libexec/ui/package: /var/tmp/mchassis-install.tgz: no such file

 

In that case, we can try to manually copy the file from master to affected member, of course into the /var/tmp.

 

Stale packages

When everything that you have tried still failed and you really don’t want to perform USB stick upgrade, there is one last thing that can be done. We are currently in the situation that cleaning /var/tmp, removing all /var/logs and doing system cleanup failed to yield enough memory to perform system upgrade. So where is the last place where we can look and try to free up some space? We can try to remove all stale packages in the /package/db. First of all, we should look for all packages that are from a previous installation. As with everything “Junosy”, there is an easy way and a hard way to do it.

 

Easy way

There are some commands to do this from freeBSD shell (of course logged in to shell as root).

 

root@sw1:RE:0% pkg setop rm previous
root@sw1:RE:0% pkg delete old

 

But of course, not all platforms support that , but should be available on EX2300/EX3400.

 

Hard way

If we want to fool ourselves into thinking that we should have more control over what we are removing, we can do this manually. First of all, we can remove all packages which would be associated with unused Junos versions. It is somehow dangerous, so it is advised to do this at your own risk. As we mentioned, there can be a situation where we have some stale packages in the /package/db/ folder. This can be quickly identified by looking at the Junos version of stored packages. From the output below, we can see that we have packages for the two Junos versions here: 17.4R2-S9 and 17.4R2-S8.

 

nootnoot@mx1> file list /packages/db/
 
/packages/db/:
jail-runtime-x86-32-20190825.5ca39af_builder_stable_11/
jail-runtime-x86-32-20191203.fa5e90e_builder_stable_11/
jdocs-x86-32-20191004.131922_builder_junos_174_r2_s8/
jdocs-x86-32-20200130.100427_builder_junos_174_r2_s9/
jfirmware-x86-32-17.4R2-S8/
jfirmware-x86-32-17.4R2-S9.1/
jpfe-X-x86-32-20191004.131922_builder_junos_174_r2_s8/
jpfe-X-x86-32-20200130.100427_builder_junos_174_r2_s9/
jpfe-X960-x86-32-20191004.131922_builder_junos_174_r2_s8/
jpfe-X960-x86-32-20200130.100427_builder_junos_174_r2_s9/
jpfe-common-x86-32-20191004.131922_builder_junos_174_r2_s8/
jpfe-common-x86-32-20200130.100427_builder_junos_174_r2_s9/
jpfe-wrlinux-x86-32-20191004.131922_builder_junos_174_r2_s8/
jpfe-wrlinux-x86-32-20200130.100427_builder_junos_174_r2_s9/
jsd-x86-32-17.4R2-S8-jet-1/
jsd-x86-32-17.4R2-S9.1-jet-1/
jsdn-x86-32-17.4R2-S8/
jsdn-x86-32-17.4R2-S9.1/
<-- output truncated -->

 

Of course, it would be too easy if Junos version in packages would be written in a consistent manner, so we have two formats to look for:

  • 174_r2_s8
  • 17.4R2-S8

So first, what we can do is just remove all packages that conform to above-mentioned formats:

 

find /packages/db -type d -name "*174_r2_s8*" -exec rm -rf {} \;
find /packages/db -type d -name "*17.4R2-S8*" -exec rm -rf {} \;

 

Of course, it does not address all packages, since OS packages don’t have Junos version in their name and on the other hand, we had an issue that there were multiple copies of the same packages for the same Junos version. To find out how to overcome that, how to know which packages can stay and which should be evicted, we can consult the ‘show version’ command to know which packages are used.

 

nootnoot@sw1> show version
fpc0:
--------------------------------------------------------------------------
<...>
JUNOS OS Kernel 32-bit  [20191022.14c2ad5_builder_stable_11]
JUNOS OS libs [20191022.14c2ad5_builder_stable_11]
JUNOS OS runtime [20191022.14c2ad5_builder_stable_11]
JUNOS OS time zone information [20191022.14c2ad5_builder_stable_11]
JUNOS py extensions [20191115.190104_builder_junos_182_r3_s2]
JUNOS py base [20191115.190104_builder_junos_182_r3_s2]
JUNOS OS crypto [20191022.14c2ad5_builder_stable_11]
JUNOS network stack and utilities [20191115.190104_builder_junos_182_r3_s2]
<...>
 
{master:0}
nootnoot@sw1> show version | match 20191022.14c2ad5_builder_stable_11 | count
Count: 6 lines
 
{master:0}
nootnoot@sw1> file list /packages/db/ | match 20191022.14c2ad5_builder_stable_11 | count
Count: 6 lines

 

Based on that, we can in a kind of safe way, remove all packages that are not currently used. Sometimes this can yield around 300MB of space for each packages bundle.

 

Unlink option changes

Some people are concerned about the ‘unlink’ option of the upgrade procedure since this command was mainly used in past on the MX routers or SRX firewalls and was not part of an upgrade cycle on the EX switches. This command was actually introduced in the 18.1 version and  we are guessing since version 18.2 as well as a default option on EX2300/EX3400 series switches.  This is how we interpret this output:

 

nootnoot@sw1> request system software add /tmp/my-perfect-image.tgz unlink no-copy  
setting unlink by default.

 

This is the only non-dangerous part of this post, but I still want to say, “you are using this at your own risk.”