Deep Dive: FW assessment and hardening
What to do if your firewall is like Swiss cheese, full of holes …
#Firewall #Network as a Service #Deep DiveFirewall assessment is one of the major tasks that should be done from time to time to ensure the firewall is actually securing the network and not just acting as an expensive router. I have recently helped one of the customers doing such security assessment followed up by hardening for a cluster of FortiGates 1800 running FortiOS 6.2.9.
The scope was to identify any shortcomings, current state of the device, advise on risks, perform policy changes. In this article, I outline the method taken to analyse the state of the firewall, determine what needs attention and how the system can be optimized. You will find quite a few useful CLI commands, GUI tricks, recommendations, tips as well as quick wins.
The approach applies to standalone and clustered firewalls. In this case, I use a VDOM-based FortiGate. The logic works for most of the units and to some extent to other vendors too.
We start the journey with the physical layer and move up to the application. It is of course not possible to cover all cases however once finished we should have a fairly objective view on the state of the device, possible actions and ways to improve security.
1. Health checks
The first check is overall system health, it determines all subsequent decisions, like which features can be enabled. The main dashboard provides a good overview of resource utilization. In case something is not available check System à Feature Visibility. The data is limited to 24h. In the case of CLI, the following commands provide basic information about the state. I have truncated the outputs for clarity.
Hardware status:
FGT-VM64 (global) # get hardware status
Model name: FortiGate-1800F
ASIC version: CP9
ASIC SRAM: 64M
CPU: Intel(R) Xeon(R) W-3223 CPU @ 3.50GHz
Number of CPUs: 16
RAM: 24101 MB
Compact Flash: 28738 MB /dev/sda
Hard disk: not available
USB Flash: not available
Network Card chipset: FortiASIC NP7 Adapter (rev.)
Hardware Board ID: 000
FortiGate’s hardware design depends on the model. Keep in mind that different Network Processor (NP) architectures assume different port layouts, for example: in order to achieve the best utilization of the platform the way you connect the ports should adhere to the guidelines.
System status:
FGT-VM64 (global) # get system status
Version: FortiGate-1800F v6.2.9,build7197,210809 (GA)
. . .
Serial-Number: xxxOperation Mode: NAT
. . .
Current HA mode: a-p, master
Cluster uptime: 56 days, 4 hours, 18 minutes, 7 seconds
. . .
System time: Thu Sep 2 14:51:49 2021
Software version provides plenty of information about i.e.: features, release notes, bugs, upgrade information, it is always a good idea to check it when performing assessment.
Performance statistics:
FGT-VM64 (global) # get system performance status
CPU states: 0% user 0% system 0% nice 100% idle 0% iowait 0% irq 0% softirq
. . .
CPU15 states: 0% user 0% system 0% nice 100% idle 0% iowait 0% irq 0% softirq
Memory: 24680424k total, 3635676k used (14.7%), 20821580k free (84.4%), 223168k freeable (0.9%)
Average network usage: 48 / 32 kbps in 1 minute, 36 / 25 kbps in 10 minutes, 36 / 25 kbps in 30 minutes
Average sessions: 65 sessions in 1 minute, 52 sessions in 10 minutes, 49 sessions in 30 minutes
Average session setup rate: 1 sessions per second in last 1 minute, 0 sessions per second in last 10 minutes, 0 sessions per second in last 30 minutes
Average NPU sessions: 0 sessions in last 1 minute, 0 sessions in last 10 minutes, 0 sessions in last 30 minutes
Average nTurbo sessions: 0 sessions in last 1 minute, 0 sessions in last 10 minutes, 0 sessions in last 30 minutes
Virus caught: 0 total in 1 minute
IPS attacks blocked: 0 total in 1 minute
Uptime: 64 days, 23 hours, 10 minutes
The output provides information from which we can conclude for example what is the influx of sessions (such as “morning mail download” spike), CPU and memory load as well as uptime and basic IPS stats.
Tip:
- In case you notice memory going above 80% there is a risk that all traffic will be blocked by a built-in protection mechanism called Conserve Mode. Settings can be adjusted but it is not recommended.
- Troubleshooting memory issues starts with displaying the top 10 daemons:
FGT-VM64 (global) # diag sys top 1 10
Run Time: 65 days, 20 hours and 31 minutes
0U, 0N, 0S, 100I, 0WA, 0HI, 0SI, 0ST; 24101T, 19137F
bcm.user 226 S < 5.5 0.4
hatalk 331 S < 1.0 0.0
forticron 313 S 0.5 0.1
ipshelper 366 S < 0.0 0.6
httpsd 25547 S 0.0 0.3
httpsd 23087 S 0.0 0.3
httpsd 23071 S 0.0 0.3
httpsd 23169 S 0.0 0.3
miglogd 304 S 0.0 0.3
cmdbsvr 275 S 0.0 0.2
- In case you notice a high watermark for sessions, check FortiView à All Sessions to see the cause, also check the capabilities of your platform.
In case the CPU is running high, we may be able to address this by changing the OS version, disabling features that are not used, optimizing the ruleset, adjusting IPS or UTM features, resetting daemons or the entire firewall. I cover some of those topics later in the article.
Power supplies:
To check power supplies you can use the following command, it is only available on “bigger” models:
FGT-VM64 (global) # execute sensor list
1 P5V_STBY_ADC alarm=0 value=4.9855 threshold_status=0
2 P5V_ADC alarm=0 value=4.9425 threshold_status=0
3 P12V_ADC alarm=0 value=12.07 threshold_status=0
. . .
Flag set to 1 in the alarm would mean there is an issue. Voltage anomalies are also visible in the logs.
In case you use a smaller model like 100/200E with a redundant power supply you can obtain supplementary information with:
# diag hard deviceinfo rps
Interface check:
Check the status of individual interfaces:
FGT-VM64 (global) # diagnose hardware deviceinfo nic port35
Description :FortiASIC NP7 Adapter
Driver Name :FortiASIC Unified NPU Driver
. . .
mtu :1500
. . .
==== Link Status ===============
Admin :Up
link_status :Up
Speed :10000
Duplex :Full
. . .
==== Host Counters =============
rx_pkts :10463855
. . .
tx_drop :0
tx_pkts :10848812
. . .
==== Netdev Counters ===========
Rx Pkts :10463855
. . .
SFPs:
Check what kind of SFPs are used:
FGT-VM64 (global) # get sys interface transceiver
Interface port17 – Transceiver is not detected.
. . .
Interface port35 – SFP/SFP+ (10.3G)
Vendor Name : FINISAR CORP.
Part No. : FTLX8574D3BCL
Serial No. : xxx
Interface port36 – SFP/SFP+ (10.3G)
Vendor Name : FINISAR CORP.
Part No. : FTLX8574D3BCL
Serial No. : yyy
Interface port37 – Transceiver is not detected.
Interface port38 – Transceiver is not detected.
Interface port39 – Transceiver is not detected.
Interface port40 – Transceiver is not detected.
Interface ha1 – SFP/SFP+ (10.3G)
Vendor Name : FINISAR CORP.
Part No. : FTLX8574D3BCL
Serial No. : aaa
Interface ha2 – SFP/SFP+ (10.3G)
Vendor Name : FINISAR CORP.
Part No. : FTLX8574D3BCL
Serial No. : bbb
Optical Optical Optical
SFP/SFP+ Temperature Voltage Tx Bias Tx Power Rx Power
Interface (Celsius) (Volts) (mA) (dBm) (dBm)
———— ———— ———— ———— ———— ————
port25 40.6 3.24 24.46 -2.6 -2.7
port35 29.7 3.29 8.79 -2.4 -3.3
port36 30.0 3.33 8.73 -2.5 -2.2
ha1 32.7 3.28 8.86 -2.4 -2.5
ha2 32.0 3.29 8.84 -2.4 -3.0
++ : high alarm, + : high warning, – : low warning, — : low alarm, ? : suspect.
HA status:
We can check the cluster state with the following command:
FGT-VM64 (global) # get sys ha status
HA Health Status: OK
. . .
Mode: HA A-P
. . .
Configuration Status:
xxx(updated 2 seconds ago): in-sync
yyy(updated 4 seconds ago): in-sync
. . .
Master: xxx, HA operating index = 0
Slave : yyy, HA operating index = 1
Tip:
In a healthy cluster, the FortiGate firewalls synchronize tables, such as:
- Session
- Routing
- Acceleration
The synchronization is done based on checksums calculated from configuration. In case of synchronization issues you can check the checksums with this command, example for VDOM root:
FGT-VM64 (global) # # diagnose sys ha checksum cluster | grep root
is_manage_master()=1, is_root_master()=1
root: 17 7c 14 49 67 8f 1a f6 09 a5 d7 05 7e bd 69 3e
root: 17 7c 14 49 67 8f 1a f6 09 a5 d7 05 7e bd 69 3e
is_manage_master()=0, is_root_master()=0
root: 17 7c 14 49 67 8f 1a f6 09 a5 d7 05 7e bd 69 3e
root: 17 7c 14 49 67 8f 1a f6 09 a5 d7 05 7e bd 69 3e
The output above shows there is a match between nodes for this VDOM.
The same should be done for other VDOMs and strings compared in for example Notepadd++. In case of discrepancy first trick to try is manual synchronization.
2. Software version
Once basic checks are documented we can proceed to the logical part of the assessment. The software version is important for various reasons, it determines the ability to obtain vendor support, describes features, bug fixes and provides release notes from which you can read if an upgrade makes sense or how to do it. Sometimes based on release notes you may consider a downgrade, for example, to improve performance or stability.
Checking release notes is always a good idea but do not be surprised if you find out that the bug you discovered and confirmed with Fortinet support was not documented in the official notes. It happened to me once when the Slave node was sending ARP replies and causing delayed packet loss due to a bug. That was not documented in the publicly available release notes nor in the fixed bugs for the newer code.
Fortinet provides a helpful upgrade tool to check the upgrade path. You may find it surprising that for example for 600D (and other models) an upgrade from 6.0.0 to 6.0.9 requires five interim upgrades.
Upgrades are usually a good idea. Sometimes they are a must, Palo Alto for example actually requires the firewall to be running the suggested release otherwise there may be issues receiving vendor support.
Network as a Service
See how we help businesses with our network consultancy services.
All vendors provide more features in newer releases, which should be evaluated before upgrades. For example, if you have a FortiGate 60D you may not need to upgrade to the latest code as it may impact performance or stability. If you want to run the latest code it is best practice to enable only the required features and disable everything else.
Aside from official claims in the case of Fortinet the unwritten rule of thumb is not to use the first 3-4 releases of each train. For example, early 6.2 and 6.4 have been known to have unexpected slowdowns, daemon crashes requiring unnecessary efforts or rollbacks to fix them.
I suggest upgrading when:
- A bug currently affecting an organization is fixed in a newer release
- A feature you need is only available in the newer release
- Support contract extension requires an upgrade
3. Security Rating
FortiGate comes with a handy built-in feature that shows a summary of different elements in relation to the security risk they generate, then gives ratings and recommendations. The feature is located in Security Fabric à Security Rating. Advanced functions of the tool require an additional license but even the built-in version already helps to get a good overview of what we have to work with. It is intuitive and structured, you can quickly find for example:
- Inactive rules,
- Vulnerabilities,
- Tips for security hardening,
- License information,
- Ability to apply quick fixes.
Note that not everything listed with a severe rating is actually bad. The tool will for example say you should never use any router or NAT device other than FortiGate or indicate the lack of unnecessary license as critical. Read through the report and take it with a grain of salt when making your judgement.
Some of the remediations can be automatically applied however the fixes are usually limited to only the safest or fairly insignificant changes. The majority of the things should be, rightfully so, done manually.
At this stage we continue documenting the findings, we may also decide to already take action. My approach is that:
- unused objects and rules should be disabled/deleted first,
- all new rules I create I describe, add comments and keep disabled,
- when I am happy with the structure I discuss everything in detail with the customer and only then start enabling the rules.
This way we can avoid business impact, for example when enabling a rule with App, Web or IPS profile.
4. Firewall hygiene
As mentioned earlier, any unused rules are summarized from the Security Rating menu. The tool provides the date of when the rules were last used. This is helpful when deciding which rules can be disabled and eventually removed.
Always “aim before fire”: analyse the rules before taking any action. There may be cases that the rule that has not been hit for months for a valid reason. For example certificate, renewal/revocation may be done once or twice a year.
Once candidate rules are chosen the best practice is to first disable them before deleting them. After doing so set up a reminder for yourself in the desired time, i.e.: a month and in case there were no complaints – just delete the rule after that month.
5. Unused objects
FortiGate allows to quickly identify unused objects in order to perform clean-up. For example, in the Policy & Objects à Addresses you can filter column Ref. and set it to 0 to display unreferenced objects:
In case you use FortiManager this is also easy:
- Ensure you are in the correct ADOM,
- Go to Policy & Objects,
- From the Tools menu, select Unused Objects. The Unused Objects dialogue box is displayed,
- When you are done, click Close.
6. The infamous “any-any” rules
Once you have cleaned up unnecessary objects and have an idea about the logical system state we can start looking into security policy, in this case, I will focus on IPv4 policy. I use the following approach:
- Enable additional columns such as Hit Counts, Sent and Received Packets, Status, Comments, Bytes:
- The Hit Counts show how many times the rule has been used and it is fundamental for the design, i.e.: specific and busy rules should be higher in the hierarchy for performance reasons, less specific rules should be lower and so on. Many publications exist on how to handle the policy, some are official, like the one from NIST,
- Sent and Received Packets – apart from the obvious, it can show if the connection is valid, there can be flows with plenty of sent packets but 0 received – this suggests a mistake in the ruleset,
- Details of the flow can show if there are i.e.: timeouts or other flags – this is accessible from the Logs & Report section,
- I filter the rules to display the busiest ones and with the “any” (Fortinet calls “any” as “all”) in either source, destination or service,
- Status shows if the rule is active,
- Comments provide more insight about the rule, can also be useful for auditing or tracking changes,
- Bytes help applying filters.
I start the policy analysis by checking the source and destination interfaces (in this case there are no zones), services and what UTM features are used. The most dangerous rules are those which allow connections from the untrusted zones to the trusted ones. I check them first. There might be historical rules that are no longer needed. The intention is to understand, secure/eliminate unwanted entry points. Luckily there was no such issue on the firewall I worked on.
We can then check how the connections from the LAN are secured. In the case, I investigated there were quite a few rules with a combination of:
- High amount of hits,
- High amount of data,
- Lack or inconsistent UTM features,
- “any” as source, destination or service.
Some of the reasons for this situation were lack of strategy of the firewall policy, uncontrolled growth, shadow rules, lack of change management and knowledge. For example, I found a Cisco switch behind one of the firewall interfaces. Closer inspection revealed 18 networks with SVIs on a switch with either /20 or /21 or /24 subnet mask. All of those networks could communicate with no restriction.
To prepare for the discussion with the customer I prioritized issues and focused on critical examples. I filtered the output to see only the “any-any” rules, which processed more than 100 TB of data, with some or no UTM features, regardless of the direction.
Tip:
To see the busiest rules quickly you can add Hit Count and Bytes as columns in the IP policy. Then you can filter the rules by the number of bytes:
By default, rules should be specific but there is always a need for more open rules, where it is impossible to micro-manage the flows. For example, users accessing the Internet via HTTPS. For those, we can enable inline or explicit proxy function on the firewall and utilize UTM features, such as SSL decrypt, Web, App, IPS, AV, etc. Keep in mind that majority of the attacks are happening from the inside of the network, therefore any excessively open traffic should be protected by the above features.
Think about IPS. It is traditionally applied from DMZ towards LAN, but newer deployments also promote IPS within the LAN, to provide an additional layer for endpoints.
I recommend checking the consistency of the features. For example in FortiGate App and Web partially overlap, in this case, I investigated some features that were allowed on one profile and blocked on another while being both used in the same rule.
The configuration should be predictable, easy to follow and ensure that for example non-business or risky traffic is being blocked by the firewall. Before making design changes, double-check internally, perhaps there already are guidelines from InfoSec team or its equivalent on how to treat specific traffic type.
Tip:
The hit counters in FortiGate are useful to identify the busiest rules. The feature usually works however depending on the release you may discover that the counters are not incrementing. I have seen that mostly in special builds, which, in general, should not be used, but some customers use them to address specific issues anyway.
If your firewall is affected by this issue you may try to clone the rule you want to analyse over the one that is affected. Clearing counters may also help but before you do it, it is good to note the information from the counters. Cloning the rule clears the counters of the new rule.
7. Optimization
Since removal or reduction of hits involves the risk of an outage, it is usually a process. Most of the activities are performed in collaboration with the customer, this way we agree on details, avoid risk and share the knowledge. In the case I investigated full-blown redesign was out of scope therefore I started the policy hardening with the below:
- Create backup and baseline.
- Collect critical infrastructure devices, like AD, DNS, DHCP, file servers, RADIUS, printers, etc.
- Understand LAN, DMZ or other zones, note the purpose of the subnets.
- Collect critical business apps, required ports – if unknown I check the logs.
- Note patterns of critical traffic flow.
- Analyse the logs, check if all assigned subnets are used, based on this we may be able to remove interfaces.
- Analyse UTM features, work with the customer to create new profiles for complianc and consistency.
- Agree on policy changes.
- Suggest naming convention.
- Create missing objects.
- Extract important traffic from the “any” rules, create specific new rules above the catch-all rules, newly created rules are disabled until approval from the customer.
- Where IPS is applicable, at the beginning I apply it in monitor mode with severity 5. I then periodically check system performance and after the agreed time I enable it to actually block this type of risk, we may also agree to include other severity levels.
- Once the more specific rules are enabled I periodically reset the counters of the unwanted rule to check how the dynamics of the counters decrease.
- If the customer agrees then one of the recent best practise for firewalls is to be more user and application-centric – to achieve this we can enable user and application awareness. It requires integration with AD, which is a simple process. Benefits include increased security, fewer rules, easier policy enforcement.
- The process is repeated until the catch-all rule can be disabled and deleted.
Caution:
- Enabling IPS and UTM can increase the security dramatically but can also result in outages related to functionality, like VoIP or resource starvation, including memory or CPU. Always apply the changes gradually.
- Nested AD group enumeration required for user and app awareness can be potentially a CPU-intensive task
To summarize:
- Create new rules over the unwanted one,
- Rules should be specific by default,
- If the rule cannot be specific – i.e.: users accessing the Internet – use UTM features for security,
- Consider using user and application awareness,
- Over time you should reduce the hits to the point when you can disable the rule without risking an outage.
8. Logs
Firewall assessment includes analysis of how the logs are managed, what and for how long is captured:
- For general traffic, it is recommended to log sessions at least at their end.
- Special sessions like i.e. for troubleshooting, DNS sinkholing or tunnelling may require logging at the session initiation.
- Super busy rules i.e. for DNS queries may not be logged. Otherwise, they can overload the firewall or create unnecessary load on the Syslog server.
- Regular deny rules should be logged at the start of the session.
- Implicit deny rule should be logged at least during troubleshooting.
- Storing logs on the local disk will impact the performance of the firewall, set up offloading and data retention.
Data retention depends on the log location, i.e. local disk or the Syslog server, like FortiAnalyzer. The minimum recommendation for a Syslog server is 30 days. Do the tests on how much disk space and CPUs will be required to support your FortiAnalyzer, storing and parsing large amounts of logs is very resource-intensive.
9. Management access
Once the policy adjustments have been actioned we should ensure that administrators connect to the firewall from a trusted source, usually a jump box. The firewall security policy should restrict that. Additionally, you can define a “Trusted Host” for user “admin”, example:
FGT-VM64 # config system admin
FGT-VM64 # (admin) edit admin
FGT-VM64 # (admin) set trusthost1 {IP}
Administrators should connect only over HTTPS, in case HTTP is used a redirect should be enabled:
FGT-VM64 # config system global
FGT-VM64 # (global) set admin-https-redirect
You can limit the administrative session to 5 minutes by issuing:
FGT-VM64 # (global) set admin-lockout-duration 300
The default idle time is set to 5 minutes, max is 480 minutes.
FGT-VM64 # config system global
FGT-VM64 # (global) set admintimeout <value>
FGT-VM64 # (global) end
Additionally you can change the logon attempts to your custom value but it is recommended to keep it at default, which is 3.
10. The maintainer account
The idea behind the maintainer account is to allow access to the firewall and reset the password of an admin account or perform a factory reset even if the admin password is unknown. This raises a concern about the backdoor. To utilize the feature you need a console cable, terminal and serial number of the firewall. This KBillustrates the steps.
In case you are worried such a backdoor is a risk then you can disable the feature:
FGT-VM64 # config system global
FGT-VM64 # (global) set admin-maintainer disable
11. Interfaces
The important element that sits between the Internet and the internal network is the firewall interfaces. What is enabled on them impacts overall security. Check what type of access is allowed on interfaces. The recommendation is to enable ping on internal interfaces acting as gateways for internal networks for troubleshooting reasons. External interfaces other than those used for S2S VPN, where IKE is needed, should have no access. This can be quickly achieved by:
FGT-VM64 # config system interface
FGT-VM64 # (interface) edit port1
FGT-VM64 # (port1) unset allowaccess
Management interface should restrict access to only what is needed, i.e.: SSH (CLI) and HTTPS (GUI, with HTTP redirect), SNMP (management), FMG-Access (FortiManager), etc.
Interfaces towards internal networks are often bundled into Port Channels for resiliency and performance. When performing assessment check if that is the case and what is the state.
12. Summary
In the case I investigated, the assessment was followed by optimization. It not only improved security and eliminated unnecessary noise: we also did hands-on knowledge sharing sessions, where customers could ask specific questions and obtain pre and post-implementation support.
As our world becomes more digitalized than ever, it is important to keep valuable data safe. As you can see, all the efforts can be a tedious activity, require a lot of investigation, discipline and time. However as Dr Ralf Speth, the Jaguar’s CEO once said:
“If you think that good design is expensive, try bad design.”
It is obvious that the costs of getting the company network compromised are often unmeasurable. At ngworx we work with customers on tailored solutions. Consider reaching out to us to find out how we help you to improve your network security and increase your confidence. We specialize but are not limited to vendors such as Juniper, Fortinet, Palo Alto and CheckPoint.