Deep Dive: Junos upgrade – filesystem is full
Not enough storage during Junos upgrade (EX2300 and EX3400). An extension of Juniper’s article KB31198 mainly addressing issues on the EX series switches:
#Network as a Service #Juniper Networks #Deep DiveDear readers, please note that this blog is a bit older and therefore the content, insights and statements may have changed over time as products, services and technologies evolve.
No matter what you do in life or how you earn your money: You really had come into contact with software upgrades at some point and if you are a network engineer you could even develop some kind of dislike to the sound of that phrase. We could probably start a lively conversation about the shared experience in that field and what could go or already went wrong. We would probably not scratch the surface with all cases of the device not coming back up, booting with wrong or corrupted software, hardware failures, power surges, data loss and create a new series of tales from the crypt (you get it, some of them will become zombie devices. Please tell us you get it,our bonus depends on that ;)). But what if we cannot even start, what if there is an issue at the fundamental stage of that process? We recently had a few cases where we couldn’t even upload the image to target devices.
As a first step, it is always good to look for obvious mistakes. If that switch isn’t actually right, maybe you already run out of space, maybe you had one too many snapshots or you were very liberal with logging and trace options. So let’s go and free some space, make some room! Below we have a simple three-strike rule what should be done as a first step on the path to making your engineering life easier.
- Try to actually free up some space
root@juniper> request system storage cleanup
- Remove old snapshots
root@juniper> request system snapshot delete *
- Try to use tmpfs to store an image (for example /tmp)
root@juniper> file copy <source> /tmp/<image>
It would be fair to give you at least a short explanation. Storage cleanup will only remove files from the following directories:
- /var/tmp
- /var/log
- /var/sw
- /var/crash
So, if you are trying to upload your image to a location which does not share disk space with them, then that will not help you much, just sayin’. Snapshots are a generally tricky topic since different Juniper devices handle them in various ways. Some can do it only to the external USB drives (QFX5100) since snapshot cannot be stored on the same media that was used to boot up the device. In general, they are copies of currently running software and configuration, so yeah, having multiple of these can quickly consume free space. Besides, usually one is enough.
Of course, before we start anything there is a viable workaround, to not use local storage at all and just do the upgrade over the network. If a user would decide to go this way, there is actually no point in reading this article further. The caveat of this approach is that only TFTP and FTP are supported protocols for that since mgd (management process) does not support SCP. But when this is not possible or not the desired solution, then you guessed it, reading continues.
request system software add <protocol>://<user>@<host>:<path_to_image> <options>
So let’s break it down: You want to download an image to device local drive and get this. Obviously what a normal person first would do is go into the “denial stage” and maybe shake a fist once or twice.
/var: write failed, filesystem is full
[...]
error:file-fetch failed
error: could not fetchlocalcopy offile
Then our normal person would check if there is REALLY enough space.
mzwk@ex42-01> show system storage
fpc0:
View all
Filesystem Size Used Avail Capacity Mounted on
/dev/da0s2a 184M 157M 12M 93% /
devfs 1.0K 1.0K 0B 100% /dev
/dev/md0 282M 282M 0B 100% /packages/mnt/junos
/dev/md1 6.8M 2.1M 4.1M 33% /packages/mfs-fips-mode-powerpc
/dev/md2 5.4M 5.4M 0B 100% packages/mnt/fips-mode-powerpc-15.1R7.9
/dev/md3 8.7M 4.1M 3.9M 51% /packages/mfs-jdocs-ex
/dev/md4 12M 12M 0B 100% /packages/mnt/jdocs-ex-15.1R7.9
/dev/md5 45M 40M 1.4M 97% /packages/mfs-junos-ex-4200
/dev/md6 83M 83M 0B 100% /packages/mnt/junos-ex-4200-15.1R7.9
/dev/md7 14M 9.1M 3.5M 72% /packages/mfs-jweb-ex
/dev/md8 26M 26M 0B 100% /packages/mnt/jweb-ex-15.1R7.9
/dev/da0s3e 123M 5.6M 107M 5% /var
/dev/md9 252M 10.0K 232M 0% /tmp
/dev/da0s3d 369M 17M 323M 5% /var/tmp
/dev/da0s4d 62M 272K 57M 0% /config
/dev/md10 118M 20M 89M 18% /var/rundb
procfs 4.0K 4.0K 0B 100% /proc
/var/jail/etc 123M 5.6M 107M 5% /packages/mnt/jweb-ex-15.1R7.9/jail/var/etc
/var/jail/run 123M 5.6M 107M 5% /packages/mnt/jweb-ex-15.1R7.9/jail/var/run
/var/jail/tmp 123M 5.6M 107M 5% /packages/mnt/jweb-ex-15.1R7.9/jail/var/tmp
/var/tmp 369M 17M 323M 5% /packages/mnt/jweb-ex-15.1R7.9/jail/var/tmp/uploads
devfs 1.0K 1.0K 0B 100% /packages/mnt/jweb-ex-15.1R7.9/jail/dev
/var/jail/jweb-app 123M 5.6M 107M 5% /packages/mnt/jweb-ex-15.1R7.9/jail/var/jweb-app
/dev/md11 6.8M 2.1M 4.1M 33% /packages/mfs-fips-mode-powerpc
/dev/md12 8.7M 4.1M 3.9M 51% /packages/mfs-jdocs-ex
/dev/md13 45M 40M 1.4M 97% /packages/mfs-junos-ex-4200
/dev/md14 14M 9.1M 3.5M 72% /packages/mfs-jweb-ex
In case, the destination is, in fact, full, one can do storage clean-up “request system storage cleanup“, as it was described in our Junos upgrade three-strike rule. Below you can see an example of command usage.
mzwk@ex42-01> request system storage cleanup
Please check the list of files to be deleted using the dry-run option. i.e.
request system storage cleanup dry-run
Do you want to proceed ? [yes,no] (no)yes
fpc0:
--------------------------------------------------------------------------
List of files to delete:
Size Date Name
11B Sep 11 2018/var/jail/tmp/alarmd.ts
148B May 13 08:28/var/log/default-log-messages.0.gz
8667B May 8 13:45/var/log/default-log-messages.1.gz
6955B Aug 2 2018/var/log/default-log-messages.2.gz
6492B Jul 5 2017/var/log/default-log-messages.3.gz
11.7K Jul 5 2017/var/log/default-log-messages.4.gz
12.7K Jul 5 2017/var/log/default-log-messages.5.gz
12.7K Jul 5 2017/var/log/default-log-messages.6.gz
14.0K Jul 5 2017/var/log/default-log-messages.7.gz
21.8K Jul 5 2017/var/log/default-log-messages.8.gz
21.4K Jul 5 2017/var/log/default-log-messages.9.gz
225.8K Jul 5 2017/var/log/erp-default.0.gz
224.9K Jul 5 2017/var/log/erp-default.1.gz
228.7K Jul 5 2017/var/log/erp-default.2.gz
353B Mar 29 2017/var/log/install.0.gz
289B Apr 30 2015/var/log/install.1.gz
12.4K May 13 08:28/var/log/interactive-commands.0.gz
11.2K Apr 2 14:00/var/log/interactive-commands.1.gz
13.8K Mar 29 15:45/var/log/interactive-commands.2.gz
14.2K Oct 28 2018/var/log/interactive-commands.3.gz
11.0K Sep 24 2017/var/log/interactive-commands.4.gz
9990B Jul 19 2017/var/log/interactive-commands.5.gz
10.2K Jul 18 2017/var/log/interactive-commands.6.gz
14.0K Jul 18 2017/var/log/interactive-commands.7.gz
12.5K Jul 17 2017/var/log/interactive-commands.8.gz
11.2K Jul 14 2017/var/log/interactive-commands.9.gz
12.6K May 13 08:28/var/log/messages.0.gz
16.8K May 6 16:00/var/log/messages.1.gz
5609B Apr 1 09:45/var/log/messages.2.gz
5642B Apr 1 05:30/var/log/messages.3.gz
5540B Apr 1 01:15/var/log/messages.4.gz
5676B Mar 31 21:00/var/log/messages.5.gz
5532B Mar 31 16:45/var/log/messages.6.gz
5537B Mar 31 12:30/var/log/messages.7.gz
5509B Mar 31 08:15/var/log/messages.8.gz
5574B Mar 31 04:00/var/log/messages.9.gz
559B May 13 08:17/var/log/wtmp.0.gz
27B May 3 14:30/var/log/wtmp.1.gz
57B Jan 1 2010/var/log/wtmp.2.gz
689B Apr 3 14:46/var/log/wtmp.3.gz
93B Mar 19 19:13/var/log/wtmp.4.gz
57B Sep 11 2018/var/tmp/krt_rpf_filter.txt
42B Sep 11 2018/var/tmp/pfe_debug_commands
0B Sep 11 2018/var/tmp/rtsdb/if-rtsdb
Furthermore, if after executing the storage cleanup there is still not enough space on the device, one may look into user home directories, especially/root/folder.
When you are certain that there is so much space on our Juniper device that you could actually get lost in there, you can hit some annoying issue, as you see below.
Junos CLI – download
mzwk@ex42-01>filecopyscp://mzwk@10.255.0.4:/home/
mzwk/jinstall-ex-4200-15.1R7.9-domestic-signed.tgz/var/tmp
mzwk@10.255.0.4's password:
jinstall-ex-4200-15.1R7.9-domestic-signed.tgz 80% 107MB 1.3MB/s 00:18 ETA
/var: write failed, filesystem is full
jinstall-ex-4200-15.1R7.9-domestic-signed.tgz 100% 132MB 1.3MB/s 01:41
/var/home/remote/...transferring.file.........UapdFg
/jinstall-ex-4200-15.1R7.9-domestic-signed.tgz: No space left on device
error:file-fetch failed
error: could not fetchlocalcopy offile
No matter what we try, we are not able to download an image to the target system, but we can try to push the image to the target device from a remote server and let the mgd do file handling. We have to be honest, to this day we are amazed that it works and that we somehow got the idea to even try it.
Unix SCP – upload
mzwk@tools01:~$scpjinstall-ex-4200-15.1R7.9-domestic-signed.tgz
mzwk@10.255.0.18:/var/tmp
Password:
jinstall-ex-4200-15.1R7.9-domestic-signed.tgz 100% 132MB 1.0MB/s 02:07
mzwk@tools01:~$
After finally managing to upload our future software image, we would like to point out two things to consider adding to the Junos upgrade procedure to make our life easier in the future: It’s good to include this flag in future Junos upgrade to conserve disk space: “no-copy” and “unlink”.
request system software add <software_package> no-copy unlink reboot
- The no-copy option will prevent the creation of copies of new packages in the /var/sw/pkg.
- Unlink will remove packages after they are installed.
As a closing remark, if a switch is running an older release (i.e. 15.1X) and it is to be upgraded to a recent release (18), a direct upgrade (with no interim releases) is normally possible, especially on EX series fixed switches. Please note that if the switch is a Virtual Chassis cluster, then it may malfunction during such an upgrade process and eventually it may fail to cause a split cluster.
Whenever it is possible, for such a major “multi-hop” upgrade it is advised to split the chassis into standalone switches and upgrade one by one. You may preconfigure the inter-switch ports upfront (i.e. in a management VLAN and respective IP addresses) and then simply convert VC ports to regular ports. Once all devices are upgraded, the VC may be recreated.
Creating a recovery snapshot
Once the upgrade to a stable Junos OS release is done, it is a very good practice to create a new recovery partition. Normally the recovery partition is created by a manual action. At factory state, it reflects the software image running on the device. To create a recovery snapshot, simply issue:
request system snapshot recovery
Unfortunately, this is also common, that after performing the upgrade, that is affected by insufficient storage issues, the recovery snapshot creation process is reporting the same problem.
To fix this, execute the following command set:
root@:RE:0%cd/var/tmp
root@:RE:0%ls-al
root@:RE:0%rm-r rtsdb
root@:RE:0%rm-r sd-upgrade
Now, try to create a recovery snapshot once more and this time it should work like a charm.
If you wanna get additional information I suggest you check out also “Deep Dive: Junos upgrade: filesystem is full part 2“.