Deep Dive: Junos upgrade – filesystem is full

Not enough storage during Junos upgrade (EX2300 and EX3400). An extension of Juniper’s article KB31198 mainly addressing issues on the EX series switches:

  #Network as a Service   #Juniper Networks   #Deep Dive  
Noam Suisa
+41 58 510 13 45
noam.suisa@umb.ch

Dear readers, please note that this blog is a bit older and therefore the content, insights and statements may have changed over time as products, services and technologies evolve.

No matter what you do in life or how you earn your money: You really had come into contact with software upgrades at some point and if you are a network engineer you could even develop some kind of dislike to the sound of that phrase. We could probably start a lively conversation about the shared experience in that field and what could go or already went wrong. We would probably not scratch the surface with all cases of the device not coming back up, booting with wrong or corrupted software, hardware failures, power surges, data loss and create a new series of tales from the crypt (you get it, some of them will become zombie devices. Please tell us you get it,our bonus depends on that ;)). But what if we cannot even start, what if there is an issue at the fundamental stage of that process? We recently had a few cases where we couldn’t even upload the image to target devices.

As a first step, it is always good to look for obvious mistakes. If that switch isn’t actually right, maybe you already run out of space, maybe you had one too many snapshots or you were very liberal with logging and trace options. So let’s go and free some space, make some room! Below we have a simple three-strike rule what should be done as a first step on the path to making your engineering life easier.

  1. Try to actually free up some spaceroot@juniper> request system storage cleanup
  2. Remove old snapshots
    root@juniper> request system snapshot delete *
  3. Try to use tmpfs to store an image (for example /tmp)
    root@juniper> file copy <source> /tmp/<image>

It would be fair to give you at least a short explanation. Storage cleanup will only remove files from the following directories:

  • /var/tmp
  • /var/log
  • /var/sw
  • /var/crash

So, if you are trying to upload your image to a location which does not share disk space with them, then that will not help you much, just sayin’. Snapshots are a generally tricky topic since different Juniper devices handle them in various ways. Some can do it only to the external USB drives (QFX5100) since snapshot cannot be stored on the same media that was used to boot up the device. In general, they are copies of currently running software and configuration, so yeah, having multiple of these can quickly consume free space. Besides, usually one is enough.

Of course, before we start anything there is a viable workaround, to not use local storage at all and just do the upgrade over the network. If a user would decide to go this way, there is actually no point in reading this article further. The caveat of this approach is that only TFTP and FTP are supported protocols for that since mgd (management process) does not support SCP. But when this is not possible or not the desired solution, then you guessed it, reading continues.

 

request system software add <protocol>://<user>@<host>:<path_to_image> <options>

 

So let’s break it down: You want to download an image to device local drive and get this. Obviously what a normal person first would do is go into the “denial stage” and maybe shake a fist once or twice.

 

/var: write failed, filesystem is full
[...]
error:file-fetch failed
error: could not fetchlocalcopy offile

 

Then our normal person would check if there is REALLY enough space.

 

mzwk@ex42-01> show system storage                                                  
fpc0:

 

View all

Filesystem    Size    Used    Avail    Capacity    Mounted on
/dev/da0s2a    184M    157M    12M    93%    /
devfs    1.0K    1.0K    0B    100%    /dev
/dev/md0    282M    282M    0B    100%    /packages/mnt/junos
/dev/md1    6.8M    2.1M    4.1M    33%    /packages/mfs-fips-mode-powerpc
/dev/md2    5.4M    5.4M    0B    100%    packages/mnt/fips-mode-powerpc-15.1R7.9
/dev/md3    8.7M    4.1M    3.9M    51%    /packages/mfs-jdocs-ex
/dev/md4    12M    12M    0B    100%    /packages/mnt/jdocs-ex-15.1R7.9
/dev/md5    45M    40M    1.4M    97%    /packages/mfs-junos-ex-4200
/dev/md6    83M    83M    0B    100%    /packages/mnt/junos-ex-4200-15.1R7.9
/dev/md7    14M    9.1M    3.5M    72%    /packages/mfs-jweb-ex
/dev/md8    26M    26M    0B    100%    /packages/mnt/jweb-ex-15.1R7.9
/dev/da0s3e    123M    5.6M    107M    5%    /var
/dev/md9    252M    10.0K    232M    0%    /tmp
/dev/da0s3d    369M    17M    323M    5%    /var/tmp
/dev/da0s4d    62M    272K    57M    0%    /config
/dev/md10    118M    20M    89M    18%    /var/rundb
procfs     4.0K    4.0K    0B    100%    /proc
/var/jail/etc    123M    5.6M    107M    5%    /packages/mnt/jweb-ex-15.1R7.9/jail/var/etc
/var/jail/run    123M    5.6M    107M    5%    /packages/mnt/jweb-ex-15.1R7.9/jail/var/run
/var/jail/tmp    123M    5.6M    107M    5%    /packages/mnt/jweb-ex-15.1R7.9/jail/var/tmp
/var/tmp    369M    17M    323M    5%    /packages/mnt/jweb-ex-15.1R7.9/jail/var/tmp/uploads
devfs    1.0K    1.0K    0B    100%    /packages/mnt/jweb-ex-15.1R7.9/jail/dev
/var/jail/jweb-app    123M    5.6M    107M    5%    /packages/mnt/jweb-ex-15.1R7.9/jail/var/jweb-app
/dev/md11    6.8M    2.1M    4.1M    33%    /packages/mfs-fips-mode-powerpc
/dev/md12    8.7M    4.1M    3.9M    51%    /packages/mfs-jdocs-ex
/dev/md13    45M    40M    1.4M    97%    /packages/mfs-junos-ex-4200
/dev/md14    14M    9.1M    3.5M    72%    /packages/mfs-jweb-ex

In case, the destination is, in fact, full, one can do storage clean-up “request system storage cleanup“, as it was described in our Junos upgrade three-strike rule. Below you can see an example of command usage.

 

mzwk@ex42-01> request system storage cleanup   
Please check the list of files to be deleted using the dry-run option. i.e.
request system storage cleanup dry-run
Do you want to proceed ? [yes,no] (no)yes
fpc0:
--------------------------------------------------------------------------
List of files to delete:
Size Date         Name
11B Sep 11  2018/var/jail/tmp/alarmd.ts
148B May 13 08:28/var/log/default-log-messages.0.gz
8667B May  8 13:45/var/log/default-log-messages.1.gz
6955B Aug  2  2018/var/log/default-log-messages.2.gz
6492B Jul  5  2017/var/log/default-log-messages.3.gz
11.7K Jul  5  2017/var/log/default-log-messages.4.gz
12.7K Jul  5  2017/var/log/default-log-messages.5.gz
12.7K Jul  5  2017/var/log/default-log-messages.6.gz
14.0K Jul  5  2017/var/log/default-log-messages.7.gz
21.8K Jul  5  2017/var/log/default-log-messages.8.gz
21.4K Jul  5  2017/var/log/default-log-messages.9.gz
225.8K Jul  5  2017/var/log/erp-default.0.gz
224.9K Jul  5  2017/var/log/erp-default.1.gz
228.7K Jul  5  2017/var/log/erp-default.2.gz
353B Mar 29  2017/var/log/install.0.gz
289B Apr 30  2015/var/log/install.1.gz
12.4K May 13 08:28/var/log/interactive-commands.0.gz
11.2K Apr  2 14:00/var/log/interactive-commands.1.gz
13.8K Mar 29 15:45/var/log/interactive-commands.2.gz
14.2K Oct 28  2018/var/log/interactive-commands.3.gz
11.0K Sep 24  2017/var/log/interactive-commands.4.gz
9990B Jul 19  2017/var/log/interactive-commands.5.gz
10.2K Jul 18  2017/var/log/interactive-commands.6.gz
14.0K Jul 18  2017/var/log/interactive-commands.7.gz
12.5K Jul 17  2017/var/log/interactive-commands.8.gz
11.2K Jul 14  2017/var/log/interactive-commands.9.gz
12.6K May 13 08:28/var/log/messages.0.gz
16.8K May  6 16:00/var/log/messages.1.gz
5609B Apr  1 09:45/var/log/messages.2.gz
5642B Apr  1 05:30/var/log/messages.3.gz
5540B Apr  1 01:15/var/log/messages.4.gz
5676B Mar 31 21:00/var/log/messages.5.gz
5532B Mar 31 16:45/var/log/messages.6.gz
5537B Mar 31 12:30/var/log/messages.7.gz
5509B Mar 31 08:15/var/log/messages.8.gz
5574B Mar 31 04:00/var/log/messages.9.gz
559B May 13 08:17/var/log/wtmp.0.gz
27B May  3 14:30/var/log/wtmp.1.gz
57B Jan  1  2010/var/log/wtmp.2.gz
689B Apr  3 14:46/var/log/wtmp.3.gz
93B Mar 19 19:13/var/log/wtmp.4.gz
57B Sep 11  2018/var/tmp/krt_rpf_filter.txt
42B Sep 11  2018/var/tmp/pfe_debug_commands
0B Sep 11  2018/var/tmp/rtsdb/if-rtsdb

 

Furthermore, if after executing the storage cleanup there is still not enough space on the device, one may look into user home directories, especially/root/folder.

When you are certain that there is so much space on our Juniper device that you could actually get lost in there, you can hit some annoying issue, as you see below.

 

Junos CLI – download

 

mzwk@ex42-01>filecopyscp://mzwk@10.255.0.4:/home/
mzwk/jinstall-ex-4200-15.1R7.9-domestic-signed.tgz/var/tmp
mzwk@10.255.0.4's password:
jinstall-ex-4200-15.1R7.9-domestic-signed.tgz                                                                                                                                                           80%  107MB   1.3MB/s  00:18 ETA
/var: write failed, filesystem is full
jinstall-ex-4200-15.1R7.9-domestic-signed.tgz                                                                                                                                                          100%  132MB   1.3MB/s  01:41   
/var/home/remote/...transferring.file.........UapdFg
/jinstall-ex-4200-15.1R7.9-domestic-signed.tgz: No space left on device
error:file-fetch failed
error: could not fetchlocalcopy offile

 

No matter what we try, we are not able to download an image to the target system, but we can try to push the image to the target device from a remote server and let the mgd do file handling. We have to be honest, to this day we are amazed that it works and that we somehow got the idea to even try it.

 

Unix SCP – upload

 

mzwk@tools01:~$scpjinstall-ex-4200-15.1R7.9-domestic-signed.tgz
mzwk@10.255.0.18:/var/tmp
Password:
jinstall-ex-4200-15.1R7.9-domestic-signed.tgz                                                                                                                                                          100%  132MB   1.0MB/s  02:07   
mzwk@tools01:~$

 

After finally managing to upload our future software image, we would like to point out two things to consider adding to the Junos upgrade procedure to make our life easier in the future: It’s good to include this flag in future Junos upgrade to conserve disk space: “no-copy” and “unlink”.

 

request system software add <software_package> no-copy unlink reboot
  • The no-copy option will prevent the creation of copies of new packages in the /var/sw/pkg.
  • Unlink will remove packages after they are installed.

As a closing remark, if a switch is running an older release (i.e. 15.1X) and it is to be upgraded to a recent release (18), a direct upgrade (with no interim releases) is normally possible, especially on EX series fixed switches. Please note that if the switch is a Virtual Chassis cluster, then it may malfunction during such an upgrade process and eventually it may fail to cause a split cluster.

Whenever it is possible, for such a major “multi-hop” upgrade it is advised to split the chassis into standalone switches and upgrade one by one. You may preconfigure the inter-switch ports upfront (i.e. in a management VLAN and respective IP addresses) and then simply convert VC ports to regular ports. Once all devices are upgraded, the VC may be recreated.

 

Creating a recovery snapshot

Once the upgrade to a stable Junos OS release is done, it is a very good practice to create a new recovery partition. Normally the recovery partition is created by a manual action. At factory state, it reflects the software image running on the device. To create a recovery snapshot, simply issue:

 

request system snapshot recovery

 

Unfortunately, this is also common, that after performing the upgrade, that is affected by insufficient storage issues, the recovery snapshot creation process is reporting the same problem.

To fix this, execute the following command set:

 

root@:RE:0%cd/var/tmp
root@:RE:0%ls-al
root@:RE:0%rm-r rtsdb
root@:RE:0%rm-r sd-upgrade

 

Now, try to create a recovery snapshot once more and this time it should work like a charm.

If you wanna get additional information I suggest you check out also “Deep Dive: Junos upgrade: filesystem is full part 2“.