F5 Resources Usage
I'm new to F5 AWAF. Considering WAF protection such as Bot Defense, Dos protection, would use high amount of resources, are there any guidelines on the resources such as CPU or memory to be allocated to these protections? Plus, imagining to have 1000 web applications to be protected, how to allocate resources properly so that F5 can handle all the protections properly? My concern is that F5 will be unable to handle the protection if there are too many application to be protected and the protection itself requires large amount of resources to work properly.Solved74Views0likes1CommentDos Attacks not showing on dashboard
Issue: DoS attacks are not showing on the DoS dashboard. Information: Strange part is this was working but then I made a few changes to split this particular virtual server into internal and external VIPs only apply DoS profile to external VIP. I have a DoS logging profile enabled on this VS I know attacks are happening as I can see them under Security > Event logs > DoS > Application Events However, any of the new attack ids don’t show under the DoS Dashboard located Security > Reporting > DoS > Dashboards Any thoughts on how to fix would be much appreciated? Thanks!455Views0likes2CommentsHow can I alert on an ASM Denial of Service event?
I would like to set an alert when a DoS profile is triggered and I'm asleep or otherwise not logged into the console. We already have alerting similar to this configured in other tools like our SIEM so I was hoping I could just send a SYSLOG alert when the profile is triggered and mitigations are applied. Our SIEM is IBM QRadar and not Splunk or ArcSight so we're unable to use DoS high speed logging, which would be overkill anyways as I'm only looking for something to indicate there is a problem and not forward detailed information about what triggered the event. I've found the IN_DOSL7_ATTACK iRule event but so far I've found two issues I'm not sure how to capture what pool or DoS profile is firing. I need this to determine the criticality of the service. I cannot seem to get it to work, even when logging to local0: Here is what I could not get to work. It was applied to the correct pool and I was able to create a DoS event that showed up in Security>Reporting>DoS. when IN_DOSL7_ATTACK { log local0. "Attacker IP: $DOSL7_ATTACKER_IP, Mitigation: $DOSL7_MITIGATION" } I'm looking at /var/log/ltm which is where I saw my other iRule logging. Is this the right location?Solved1.6KViews0likes13Comments2. SYN Cookie: Operation
Introduction As concluded in the last article, in order to avoid allocating space for TCB, the attacked device needs to reject TCP SYN packets sent by clients. In this article I will explain how a system can do this without causing service disruption, and more specifically I will explain how this work in BIG-IP. Implementation Since attacked device need to alter default TCP 3WHS behaviour the best option is implementing SYN Cookie countermeasure in an intermediate device, so you offload your servers from this task. In this way if connection is not legitimate then it is just never forwarded to the backend server and therefore it will not waste any kind of extra resourcess. Since BIG-IP is already in charge of handling application traffic directed towards servers it is the best place to implement SYN Cookie. What BIG-IP does is adding an extra layer, which we can call SYN Cookie Agent, that basically implement a stateless buffer between client and BIG-IP TCP stack. This agent is in charge of handling client’s TCP SYN packets by modifying slightly the standard behaviour of TCP 3WHS, this modification comes in two flavours depending on what type of routing role BIG-IP is playing between client and server. Symmetric routing This is the typical situation. In this environment BIG-IP is sited in the middle of the TCP conversation and all the traffic from/to server pass through it. The SYN Cookie operation in this case is briefly described below: Client sends a TCP SYN packet to BIG-IP. BIG-IP uses a stateless buffer for answering each client SYN with a SYN/ACK. BIG-IP generates the 32 bits sequence number which will be included in SYN/ACK packet sent to the client. BIG-IPencodes essential and mandatory information of the connection in 24 bits. BIG-IP hashes the previous encoded information. BIG-IP also encodes other values like MSS in the remaining bits. BIG-IP generates the SYN/ACK response packet and includes the hash and other encoded information as sequence number for the packet. This is the so calledSYN Cookie. BIG-IP sends SYN/ACK to client. BIG-IP discards the SYN from the stateless buffer, in other words, BIG-IP removes all information related to this TCP connection. At this point no memory has been allocated for TCP connection (TCB). Client sends correct ACK to BIG-IP acknowledging BIG-IP’s sequence number. BIG-IP validates ACK. BIG-IP subtracts one to this ACK. BIG-IP runs the hash function using connection information as input (see point 3-b above). Then it compares it with the hash provided in the ACK, if they match then it means client sent a correct SYN Cookie response and that client is legitimate. BIG-IP uses connection information in the ACK TCP/IP headers to create an entry in connflow. At this time is when BIG-IP builds TCB entry, so it's the first time BIG-IP uses memory to save connection information. BIG-IP increase related SYN Cookie stats. BIG-IP starts and complete a TCP 3WHS with backend server on behalf of the client since we are sure that client is legitimate. Now traffic for the connection is proxied by BIG-IP as usual attending to configured L4 profile in the virtual server. *If ACK packet received by BIG-IP is a spurious ACK then BIG-IP will discard it and no entry will be created in connection table. Since attackers will never send a correct response to a SYN/ACK then you can be sure that TCB entries are never created for them. Only legitimate clients will use BIG-IP resources as shown in diagram: Fig4. TCP SYN Flood attack with SYN Cookie countermeasure Asymmetric routing In an asymmetric environment, also called nPath or DSR, you face a different problem because Big-IP cannot establish a direct TCP 3WHS with the server (point 11 in last section). As you can see in below diagram SYN/ACK packet sent from server to client would not traverse Big-IP, so method used for symmetric routing cannot be used in this case. You need another way to trust in clients and discard them as attackers, so clients then can complete TCP 3WHS directly with the server. Fig5. TCP state diagram section for asymmetric routing + FastL4 In order to circumvent this problem BIG-IP takes the advantage of the fact that applications will try to start a second TCP connection if a first TCP 3WHS is RST by the server. What BIG-IP does in asymmetric routing environments is completing the TCP three way handshake with client, issuing SYN Cookie as I described for symmetric routing, and once BIG-IP confirms that client is trustworthy it will add its IP to SYN Cookie Whitelist, it will send a RST to the client and it will close the TCP connection. At this point client will try to start a new connection and this time BIG-IP will let the client talk directly to the server as it would do under normal circumstances (see diagram above). The process is briefly described below: Client sends a TCP SYN packet to BIG-IP. BIG-IP generates the sequence number as explained for symmetric routing. BIG-IP generates the SYN/ACK response packet and includes the calculated sequence number for the packet. BIG-IP sends SYN/ACK to client and then remove all information related to this TCP connection. At this point no memory has been allocated for TCP connection (TCB). Client sends correct ACK to BIG-IP acknowledging sequence number. BIG-IP validates ACK. BIG-IP substract one to this ACK so it can decode needed information and check hash. BIG-IP increase related SYN Cookie stats. BIG-IP sends a RST to client and discards TCP connection. Big-IP adds client’s IP to Whitelist. Clients starts a new TCP connection. BIG-IP lets client to start this new TCP connection directly to the server since it knows that server is legitimate. *If ACK packet received by BIG-IP is a spurious ACK then BIG-IP will discard it and no entry will be created in connection table. Fig6. TCP 3WHS flow in asymmetric routing DSR whitelist has some important characteristics you need to take into account: By default IP is added 30 seconds to the whitelist (DB Key tm.flowstate.timeout). Client’s IP is added to whitelist for 30 seconds but only if there is no traffic from this client, if BIG-IP have already seen ACK from this client then its IP is removed from whitelist since connection has been already established between client and server. Whitelist is common to all virtual servers, so if a client is whitelisted it will be for all applications. Whitelist it is not mirrored. Whitelist it is not shared among blades. SYN Cookie Challenge When a TCP connection is initiated the TCP SYN packet sent by the client specify certain values that define the connection, some of these values are mandatory, like source and destination IP and port, otherwise you would not be able to identify the correct connection when packets arrive. Some other values are optional and they are used to improve performance, like TCP options or WAN optimizations. Under normal circumstances, upon TCP SYN reception the system creates a TCB entry where all this information is saved. BIG-IP will use this information to set up the TCP connection with the backend server. The problem we face is that when SYN Cookie is in play device does not create the TCB entry, only a limited piece of connection information is collected, so the information that BIG-IP has about the TCP connection is limited. Remember that SYN Cookie is not handled by TCP stack but by the stateless SYN Cookie Agent, so we cannot save the connection details. What BIG-IP does instead is encoding key information in 24 bits, this left only some bits for the rest of data. While there is no room for values like Window Scale information, however we have space for other values, for example 3 bits reserved for MSS value. This limits MSS possible values to 8, the most commonly used values, and the one chosen will the nearest to the original MSS value. However, note that MSS limitation has a workaround through configuration in FastL4 profile that it can help in some cases. The option SYN Cookie MSS in this profile specifies a value that overrides the SYN cookie maximum segment size (MSS) value in the SYN-ACK packet that is returned to the client. Valid values are 0, and values from 256 through 9162. The default is 0, which means no override. You might use this option if backend servers use a different MSS value for SYN cookies than the BIG-IP system does. If this is not enough for some customers, BIG-IP overcomes this problem by using TCP timestamp space to save extra information about TCP connection, in other words we create a second extra SYN Cookie. This space is used to record the client and server side Window Scale values, or the SACK info which are then made available to the TCP stack via this cookie if the connection is accepted. Note that TCP connection will only be accepted if both SYN Cookies (standard and Timestamp) are correct. So all this means that if the client starts a connection with TCP TS set then Big-IP will have more space to encode information and hence the performance will be improved when under TCP SYN flood attack. NOTE: SYN Cookie Timestamp extension (software or hardware) only work for standard virtual servers currently. FastL4 virtual servers only supports MMS option in SYN Cookie mode. Conclusion Now you know how SYN Cookie works under the hoods. In next article I will describe when SYN Cookie is activated and I will give specific details of BIG-IP SYN Cookie operation. Note that limitations related to SYN Cookie commented in this article are not caused specifically by BIG-IP SYN Cookie implementation but by SYN Cookie standard itself. In fact, F5 Networks implements some improvements on SYN Cookie to get a better performance and provide some extra features.3KViews3likes3CommentsProactive Bot Defense Bypass by Bot Signature
Problem this snippet solves: This code enables you to bypass Proactive Bot Defense for a specific bot signature. Caution: If the signature is simple, it may be easy for an attacker to guess it and craft a response to match the signature and thus bypass Proactive Bot Defense with this in place. For this reason, another bypass solution is recommended where possible. You can bypass Proactive Bot Defense without this iRule by setting a benign category to "Report" and ensuring that the signature has a reverse DNS lookup in place. This will validate the source in addition to other factors such as the User-Agent. How to use this snippet: Add to the virtual server that is protected by Proactive Bot Defense and Bot Signatures. Enter the signature you want to bypass in the code where the example "curl" is placed currently. The signature's category must be set to report or block for this to take effect. Tested on v13.1. Code : when BOTDEFENSE_ACTION { #log local0. "signature: [BOTDEFENSE::bot_signature]" if { [BOTDEFENSE::bot_signature] ends_with "curl"} { BOTDEFENSE::action allow } }753Views1like1CommentExplanation of F5 DDoS threshold modes
Der Reader, In my article “Concept of Device DOS and DOS profile”, I recommended to use the “Fully Automatic” or “Multiplier” based configuration option for some DOS vectors. In this article I would like to explain how these threshold modes work and what is happening behind the scene. When you configure a DOS vector you have the option to choose between different threshold modes: “Fully Automatic”, “Auto Detection / Multiplier Based”, “Manual Detection / Auto Mitigation” and “Fully Manual”. Figure 1: Threshold Modes The two options I normally use on many vectors are “Fully Automatic” and “Auto Detection / Multiplier Based”. But what are these two options do for me? To manually set thresholds is for some vectors not an easy task. I mean who really knows how many PUSH/ACK packets/sec for example are usually hitting the device or a specific service? And when I have an idea about a value, should this be a static value? Or should I better take the maximum value I have seen so far? And how many packets per second should I put on top to make sure the system is not kicking in too early? When should I adjust it? Do I have increasing traffic? Fully Automatic In reality, the rate changes constantly and most likely during the day I will have more PUSH/ACK packets/sec then during the night. What happens when there is a campaign or an event like “Black Friday” and way more users are visiting the webpage then usually? During these high traffic events, my suggested thresholds might be no longer correct which could lead to “good” traffic getting dropped. All this should be taken into consideration when setting a threshold and it ends up being very difficult to do manually. It´s better to make the machine doing it for you and this is what “Fully Automatic” is about. Figure 2: Expected EPS As soon as you use this option, it leverages from the learning it has done since traffic is passing through the BIG-IP, or since you have enforced the relearning, which resets everything learned so far and starts from new. The system continuously calculates the expected rates for all the vectors based on the historic rates of traffic. It takes the information up to one year and calculates them with different weights in order to know which packets rate should be expected at that time and day for that specific vector in the specific context (Device, Virtual Server/Protected Object). The system then calculates a padding on top of this expected rate. This rate is called Detection Rate and is dependent on the “threshold sensitivity” you have configured: Low Sensitivity means 66% padding Medium Sensitivity means 40% padding High Sensitivity means 0% padding Figure 3: Detection EPS As soon as the current rate is above the detection value, the BIG-IP will show the message “Attack detected”, which actually means anomaly detected, because it sees more packets of that specific vector then expected + the padding (detection_rate). But DoS mitigation will not start at that point! Figure 4: Current EPS Keep in mind, when you run the BIG-IP in stateful mode it will drop 'out of state' packets anyway. This has nothing to do with DoS functionalities. But what happens when there is a serious flood and the BIG-IP CPU gets high because of the massive number of packets it has to deal with? This is when the second part of the “Fully Automatic” approach comes into the game. Again, depending on your threshold sensitivity the DOS mitigation starts as soon as a certain level of stress is detected on the CPU of the BIG-IP. Figure 5: Mitigation Threshold Low Sensitivity means 78,3% TMM load Medium Sensitivity means 68,3% TMM load High Sensitivity means 51,6% TMM load Note, that the mitigation is per TMM and therefore the stress and rate per TMM is relevant. When the traffic rate for that vector is above the detection rate and the CPU of the BIG-IP (Device DOS) is “too” busy, the mitigation kicks in and will rate limit on that specific vector. When a DOS vector is hardware supported, FPGAs drop the packets at the switch level of the BIG-IP. If that DOS vector is not hardware supported, then the packet is dropped at a very early stage of the life cycle of a packet inside a BIG-IP. The rate at which are packets dropped is dynamic (mitigation rate), depending on the incoming number of packets for that vector and the CPU (TMM) stress of the BIG-IP. This allows the stress of the CPU to go down as it has to deal with less packets. Once the incoming rate is again below the detection rate, the system declares the attack as ended. Note: When an attack is detected, the packet rate during that time will not go into the calculation of expected rates for the future. This ensures that the BIG-IP will not learn from attack traffic skewing the automatic thresholds. All traffic rates below the detection (or below the floor value, when configured) rate modify the expected rate for the future and the BIG-IP will adjust the detection rate automatically. For most of the vectors you can configure a floor and ceiling value. Floor means that as long as the traffic value is below that threshold, the mitigation for that vector will never kick in. Even when the CPU is at 100%. Ceiling means that mitigation always kicks in at that rate, even when the CPU is idle. With these two values the dynamic and automatic process is done between floor and ceiling. Mitigation only gets executed when the rate is above the rate of the Floor EPS and Detection EPS AND stress on the particular context is measured. Figure 6: Floor and Ceiling EPS What is the difference when you use “Fully Automatic” on Device level compared to VS/PO (DOS profile) level? Everything is the same, except that on VS or Protected Object (PO) level the relevant stress is NOT the BIG-IP device stress, it is the stress of the service you are protecting (web-, DNS, IMAP-server, network, ...). BIG-IP can measure the stress of the service by measuring TCP attributes like retransmission, window size, congestion, etc. This gives a good indication on how busy a service is. This works very well for request/response protocols like TCP, HTTP, DNS. I recommend using this, when the Protected Object is a single service and not a “wildcard” Protected Object covering for example a network or service range. When the Protected Object is a “wildcard” service and/or a UDP service (except DNS), I recommend using “Auto Detection / Multiplier Option”. It works in the same way as the “Fully Automatic” from the learning perspective, but the mitigation condition is not stress, it is the multiplication of the detection rate. For example, the detection rate for a specific vector is calculated to be 100k packets/sec. By default, the multiplication rate is “500”, which means 5x. Therefore, the mitigation rate is calculated to 500k packets/sec. If that particular vector has more than 500k packets/sec those packets would be dropped. The multiplication rate can also be individually configured. Like in the screenshot, where it is set to 8x (800). Figure 7: Auto Detection / Multiplier Based Mitigation The benefit of this mode is that the BIG-IP will automatically learn the baseline for that vector and will only start to mitigate based on a large spike. The mitigation rate is always a multiplication of the detection rate, which is 5x by default but is configurable. When should I use “Fully Manual”? When you want to rate-limit a specific vector to a certain number of packets/sec, then “Fully Manual” is the right choice. Very good examples for that type of vector are the “Bad Header” vector types. These type of packets will never get forwarded by the BIG-IP so dropping them by a DoS vector saves the CPU, which is beneficial under DoS conditions. In the screenshot below is a vector configured as “Fully Manual”. Next I’ll describe what each of the options means. Figure 8: Fully Manual Detection Threshold EPS configures the packet rate/sec (pps) when you will get a log messages (NO mitigation!). Detection Threshold % compares the current pps rate for that vector with the multiplication of the configured percentage (in this example 5 for 500%) with the 1-minute average rate. If the current rate is higher, then you will get a log message. Mitigation Threshold EPS rate limits to that configured value (mitigation). I recommend setting the threshold (Mitigation Threshold EPS) to something relatively low like ‘10’ or ‘100’ on ‘Bad Header’ type of vectors. You can also set it also to ‘0’, which means all packets hitting this vector will get dropped by the DoS function which usually is done in hardware (FPGA). With the ‘Detection Threshold EPS’ you set the rate at which you want to get a log messages for that vector. If you do it this way, then you get a warning message like this one to inform you about the logging behavior: Warning generated: DOS attack data (bad-tcp-flags-all-set): Since drop limit is less than detection limit, packets dropped below the detection limit rate will not be logged. Another use-case for “Fully Manual” is when you know the maximum number of these packets the service can handle. But here my recommendation is to still use “Fully Automatic” and set the maximum rate with the Ceiling threshold, because then the protected service will benefit from both threshold options. Important: Please keep in mind, when you set manual thresholds for Device DoS the thresholds are to protect each TMM. Therefore the value you set is per TMM! An exception to this is the Sweep and Flood vector where the threshold is per BIG-IP/service and not per TMM like on DoS profiles. When using manual thresholds for aDOS profileof a Protected Object thethreshold configuration is per service (all packets targeted to the protected service) NOT per TMM like on the Device level. Here the goal is to set how many packets are allowed to pass the BIG-IP and reach the service. The distribution of these thresholds to the TMMs is done in a dynamic way: Every TMM gets a percentage of the configured threshold, based on the EPS (Events Per Second, which is in this context Packets Per Second) for the specific vector the system has seen in the second before on this TMM. This mechanism protects against hash type of attacks. Ok, I hope this article gives you a better understanding on how ‘Fully Manual’, ‘Fully Automatic’ and ‘Auto Detection / Multiplier Based Mitigation’ works. They are important concepts to understand, especially when they work in conjunction with stress measurement. This means the BIG-IP will only kick in with the DoS mitigation when the protected object (BIG-IP or the service behind the BIG-IP) is under stress. -Why risk false positives, when not necessary? With my next article I will demonstrate you how Device DoS and the DoS profiles work together andhow the stateful concept cooperates with the DoS mitigation. I will show you some DoS commands to test it and also commands to get details from the BIG-IP via CLI. Thank you, sVen Mueller9.7KViews21likes10CommentsDemonstration of Device DoS and Per-Service DoS protection
Dear Reader, This article is intended to show what effect the different threshold modes have on the Device and Per-Service (VS/PO) context. I will be using practical examples to demonstrate those effects. You will get to review a couple of scripts which will help you to do DoS flood tests and “visualizing” results on the CLI. In my first article (https://devcentral.f5.com/s/articles/Concept-of-F5-Device-DoS-and-DoS-profiles), I talked about the concept of F5 Device DoS and Per-Service DoS protection (DoS profiles). I also covered the physical and logical data path, which explains the order of Device DoS and Per-Service DoS using the DoS profiles. In the second article (https://devcentral.f5.com/s/articles/Explanation-of-F5-DDoS-threshold-modes), I explained how the different threshold modes are working. In this third article, I would like to show you what it means when the different modes work together. But, before I start doing some tests to show the behavior, I would like to give you a quick introduction into the toolset I´m using for these tests. Toolset First of all, how to do some floods? Different tools that can be found on the Internet are available for use. Whatever tools you might prefer, just download the tool and run it against your Device Under Test (DUT). If you would like to use my script you can get it from GitHub: https://github.com/sv3n-mu3ll3r/DDoS-Scripts With this script - it uses hping - you can run different type of attacks. Simply start it with: $ ./attack_menu.sh <IP> <PORT> A menu of different attacks will be presented which you can launch against the used IP and port as a parameter. Figure 1:Attack Menu To see what L3/4 DoS events are currently ongoing on your BIG-IP, simply go to the DoS Overview page. Figure 2:DoS Overview page I personally prefer to use the CLI to get the details I´m interested in. This way I don´t need to switch between CLI to launch my attacks and GUI to see the results. For that reason, I created a script which shows me what I am most interested in. Figure 3:DoS stats via CLI You can download that script here: https://github.com/sv3n-mu3ll3r/F5_show_DoS_stats_scripts Simply run it with the “watch” command and the parameter “-c” to get a colored output (-c is only available starting with TMOS version 14.0): What is this script showing you? context_name: This is the context, either PO/VS or the Device in which the vector is running vector_name: This is the name of the DoS vector attack_detected:When it shows “1”, then an attack has been detected, which means the ‘stats_rate’ is above the ‘detection-rate'. stats_rate: Shows you the current incoming pps rate for that vector in this context drops_rate: Shows you the number dropped pps rate in software (not FPGA) for that vector in this context int_drops_rate: Shows you the number dropped pps rate in hardware (FPGA) for that vector in this context ba_stats_rate: Shows you the pps rate for bad actors ba_drops_rate: Shows you the pps rate of dropped ‘bad actors’ in HW and SW bd_stats_rate:Shows you the pps rate for attacked destination bd_drop_rate: Shows you the pps rate for dropped ‘attacked destination’ mitigation_curr: Shows the current mitigation rate (per tmm) for that vector in that context detection: Shows you the current detection rate (per tmm) for that vector in that context wl_count: Shows you the number of whitelist hits for that vector in that context hw_offload: When it shows ‘1’ it means that FPGAs are involved in the mitigation int_dropped_bytes_rate: Gives you the rate of in HW dropped bytes for that vector in that context dropped_bytes_rate: Gives you the rate of in SW dropped bytes for that vector in that context When a line is displayed in green, it means packets hitting that vector. However, no anomaly is detected or anything is mitigated (dropped) via DoS. If a line turns yellow, then an attack - anomaly – has been detected but no packets are dropped via DoS functionalities. When the color turns red, then the system is actually mitigating and dropping packets via DoS functionalities on that vector in that context. Start Testing Before we start doing some tests, let me provide you with a quick overview of my own lab setup. I´m using a DHD (DDoS Hybrid Defender) running on a i5800 box with TMOS version 15.1 My traffic generator sends around 5-6 Gbps legitimate (HTTP and DNS) traffic through the box which is connected in L2 mode (v-wire) to my network. On the “client” side, where my clean traffic generator is located, my attacking clients are located as well by use of my DoS script. On the “server” side, I run a web service and DNS service, which I´m going to attack. Ok, now let’s do some test so see the behavior of the box and double check that we really understand the DDoS mitigation concept of BIG-IP. Y-MAS flood against a protected server Let’s start with a simple Y-MAS (all TCP flags cleared) flood. You can only configure this vector on the device context and only in manual mode. Which is ok, because this TCP packet is not valid and would get drop by the operating system (TMOS) anyway. But, because I want this type of packet get dropped in hardware (FPGA) very early, when they hit the box, mostly without touching the CPU, I set the thresholds to ‘10’ on the Mitigation Threshold EPS and to ‘10’ on Detection Threshold EPS. That means as soon as a TMM sees more then 10 pps for that vector it will give me a log message and also rate-limit this type of packets per TMM to 10 packets per second. That means that everything below that threshold will get to the operating system (TMOS) and get dropped there. Figure 4:Bad TCP Flags vector As soon as I start the attack, which targets the web service (10.103.1.50, port 80) behind the DHD with randomized source IPs. $ /usr/sbin/hping3 --ymas -p 80 10.103.1.50 --flood --rand-source I do get a log messages in /var/log/ltm: Feb5 10:57:52 lon-i5800-1 err tmm3[15367]: 01010252:3: A Enforced Device DOS attack start was detected for vector Bad TCP flags (all cleared), Attack ID 546994598. And, my script shows me the details on that attack in real time (the line is in ‘red’, indicating we are mitigating): Currently 437569 pps are hitting the device. 382 pps are blocked by DDoS in SW (CPU) and 437187 are blocked in HW (FPGA). Figure 5:Mitigating Y-Flood Great, that was easy. :-) Now, let’s do another TCP flood against my web server. RST-flood against a protected server with Fully manual Threshold mode: For this test I´m using the “Fully Manual” mode, which configures the thresholds for the whole service we are protecting with the DoS profile, which is attached to my VS/PO. Figure 6:RST flood with manual configuration My Detection Threshold and my Mitigation Threshold EPS is set to ‘100’. That means as soon as we see more then 100 RST packets hitting the configured VS/PO on my BIG-IP for this web server, the system will start to rate-limit and send a log message. Figure 7:Mitigating RST flood on PO level Perfect. We see the vector in the context of my web server (/Common/PO_10.103.1.50_HTTP) is blocking (rate-limiting) as expected from the configuration. Please ignore the 'IP bad src' which is in detected mode. This is because 'hping' creates randomized IPs and not all of them are valid. RST-flood against a protected server with Fully Automatic Threshold mode: In this test I set the Threshold Mode for the RST vector on the DoS profile which is attached to my web server to ‘Fully Automatic’ and this is what you would most likely do in the real world as well. Figure 8:RST vector configuration with Fully Automatic But, what does this mean for the test now? I run the same flood against the same destination and my script shows me the anomaly on the VS/PO (and on the device context), but it does not mitigate! Why would this happen? Figure 9:RST flood with Fully Automatic configuration When we take a closer look at the screenshot we see that the ‘stats_rate’ shows 730969 pps. The detection rate shows 25. From where is this 25 coming? As we know, when ‘Fully Automatic’ is enabled then the system learns from history. In this case, the history was even lower than 25, but because I set the floor value to 100, the detection rate per TMM is 25 (floor_value / number of TMMs), which in my case is 100/4 = 25 So, we need to recognize, that the ‘stats_rate’ value represents all packets for that vector in that context and the detection value is showing the per TMM value. This value explains us why the system has detected an attack, but why is it not dropping via DoS functionalities? To understand this, we need to remember that the mitigation in ‘Fully Automatic’ mode will ONLY kick in if the incoming rate for that vector is above the detection rate (condition is here now true) AND the stress on the service is too high. But, because BIG-IP is configures as a stateful device, this randomized RST packets will never reach the web service, because they get all dropped latest by the state engine of the BIG-IP. Therefor the service will never have stress caused by this flood.This is one of the nice benefits of having a stateful DoS device. So, the vector on the web server context will not mitigate here, because the web server will not be stressed by this type of TCP attack. This does also explains the Server Stress visualization in the GUI, which didn´t change before and during the attack. Figure 10:DoS Overview in the GUI But, what happens if the attack gets stronger and stronger or the BIG-IP is too busy dealing with all this RST packets? This is when the Device DOS kicks in but only if you have configured it in ‘Fully Automatic’ mode as well. As soon as the BIG-IP receives more RST packets then usually (detection rate) AND the stress (CPU load) on the BIG-IP gets too high, it starts to mitigate on the device context. This is what you can see here: Figure 11:'massive' RST flood with Fully Automatic configuration The flood still goes against the same web server, but the mitigation is done on the device context, because the CPU utilization on the BIG-IP is too high. In the screenshot below you can see that the value for the mitigation (mitigation_curr) is set to 5000 on the device context, which is the same as the detection value. This value resultsfrom the 'floor' value as well. It is the smallest possible value, because the detection and mitigation rate will never be below the 'floor' value. The mitigation rate is calculated dynamically based on the stress of the BIG-IP. In this test I artificially increased the stress on the BIG-IP and therefor the mitigation rate was calculated to the lowest possible number, which is the same as the detection rate. I will provide an explanation of how I did that later. Figure 13:Device context configuration for RST Flood Because this is the device config, the value you enter in the GUI is per TMM and this is reflected on the script output as well. What does this mean for the false-positive rate? First of all, all RST packets not belonging to an existing flow will kicked out by the state engine. At this point we don´t have any false positives. If the attack increases and the CPU can´t handle the number of packets anymore, then the DOS protection on the device context kicks in. With the configuration we have done so far, it will do the rate-limiting on all RST packets hitting the BIG-IP. There is no differentiation anymore between good and bad RST, or if the RST has the destination of server 1 or server 2, and so on. This means that at this point we can face false positives with this configuration. Is this bad? Well, false-positives are always bad, but at this point you can say it´s better to have few false-positives then a service going down or, even more critical, when your DoS device goes down. What can you do to only have false positives on the destination which is under attack? You probably have recognized that you can also enable “Attacked Destination Detection” on the DoS vector, which makes sense on the device context and on a DoS profile which is used on protected object (VS), that covers a whole network. Figure 14:Device context configuration for RST Flood with 'Attacked Destination Detection' enabled If the attack now hits one or multiple IPs in that network, BIG-IP will identify them and will do the rate-limiting only on the destination(s) under attack. Which still means that you could face false positives, but they will be at least only on the IPs under attack. This is what we see here: Figure 15:Device context mitigation on attacked destination(s) The majority of the RST packet drops are done on the “bad destination” (bd_drops_rate), which is the IP under attack. The other option you can also set is “Bad Actor Detection”. When this is enabled the system identifies the source(s) which causes the load and will do the rate limiting for that IP address(es). This usually works very well for amplification attacks, where the attack packets coming from ‘specific’ hosts and are not randomized sources. Figure 16:Device context mitigation on attacked destination(s) and bad actor(s) Here you can see the majority of the mitigation is done on ‘bad actors’. This reduces the chance of false positives massively. Figure 17:Device context configuration for RST Flood with 'Attacked Destination Detection' and 'Bad Actor Detection' enabled You also have multiple additional options to deal with ‘attacked destination’ and ‘bad actors’, but this is something I will cover in another article. Artificial increase BIG-IPs CPU load Before I finish this article, I would like to give you an idea on how you could increase the CPU load of the BIG-IP for your own testing. Because, as we know, with “Fully Automatic” on the device context, the mitigation kicks only in if the packet rate is above the detection rate AND the CPU utilization on the BIG-IP is “too” high. This is sometimes difficult to archive in a lab because you may not be able to send enough packets to stress the CPUs of a HW BIG-IP. In this use-case I use a simple iRule, which I attach to the VS/PO that is under attack. Figure 18:Stress iRule When I start my attack, I send the traffic with a specific TTL for identification. This TTL is configured in my iRule in order to get a CPU intensive string compare function to work on every attack packet. Here is an example for 'hping': $ /usr/sbin/hping3 --rst -p 80 10.103.1.50 --ttl 5 --flood --rand-source This can easily drive the CPU very high and the DDoS rate-limiting kicks in very aggressive. Summary I hope that this article provides you with an even better understanding on what effect the different threshold modes have on the attack mitigation. Of course, keep in mind this are just the ‘static DoS’ vectors. In later articles I will explain also the 'Behavioral DoS' and the Cookie based mitigation, which helps to massively reduce the chance of a false positives. But, also keep in mind, the DoS vectors start to act in microseconds and are very effective for protecting. Thank you, sVen Mueller3.6KViews12likes3Comments3. SYN Cookie: SYN Cache
Introduction In previous articles I have explained why it is so important to implement TCP SYN Cookie in order to protect exposed applications. In this article I will explain when SYN Cookie is activated and different aspects you should take into account when you configure it in LTM. SYN cache You know how SYN Cookie works in broad strokes, so now the pending question about implementation is to know when BIG-IP should activate SYN Cookie countermeasure. This is where security administrators come into play. Commonly SYN Cookie implementation works together with another countermeasure called SYN cache. SYN cache is based in the use of a cache for incomplete TCBs, this allows devices to save some resources comparing with standard TCP connection because full state allocation for TCB is delayed until the TCP 3WHS has been fully finished.I will not describe detailed functioning of SYN cache here but just note that this method could increase time to establish legitimate TCP connections in 15%. BIG-IP does not use a cache as is, instead it uses a TCP embryonic connection counter, if BIG-IP detects that the number of embryonic connections have been exceeded attending to a configured threshold then SYN Cookie is activated. This is more efficient than typical SYN cache implementation since it does not add any delay by creating incomplete TCBs prior to complete TCP 3WHS. Also, it does not need to allocate memory for these incomplete TCBs. The process of SYN Cookie activation can be described as below: Fig7. TCP SYN Flood attack + SYN Cache Our only task as Security Administrators is defining the best threshold for this counter. It is very important that threshold fits perfectly in your environment and there are two main reasons for it: Depending on our BIG-IP device platform SYN Cookie can be implemented in hardware or in software, if our device has not hardware offloading capability for SYN Cookie then the SYN Cookie process described in second article of this series must be run in TMM, that is, in software. Although the process is optimised there will be a slight penalty in CPU usage during an attack since TMM will have to create SYN Cookie challenges and validate client responses. This should not affect normal functioning of the device but it is relevant to take this into account since not always customer has a perfect BIG-IP sizing. So it is important to distinguish between a real attack or a wrong configured threshold,and avoid extra load to your devices by setting a correct SYN Cache value. As you will see with more detail in later articles in this series, activating SYN Cookie can have some consequences due to how SYN Cookie is implemented, not specifically in BIG-IP devices, but SYN Cookie standard in general. One important drawback is the limited space used for the SYN Cookie challenge which, as explained in previous article, it will limit the possible MSS sizes and also will remove some TCP options information (unless TCP Timestamp is used). This could cause some impact by slowing down the traffic between client and backend server when SYN Cookie is active. However, note something very important, SYN Cookie is only activated when under attack if it is correctly configured, so the possible choices during an attack are disruption if no countermeasure is configured or, in the worst cases, having a minimum impact in performance until attack is finished. Also, if SYN Cookie is correctly configured it should be only activated specifically on virtual servers under attack, so usually you will not notice any issue. Although F5 Networks creates a default configuration for SYN cache, that you will read about in the next article, the value that better fits with your environment is the value you define attending to expected traffic patterns, and you have the best knowledge of your network. Operation As a short summary you can check below flow diagram where the most important steps are shown, so you can have at a glance a global idea about what it happens when SYN cache is exceeded in a TMM and hence SYN Cookie is activated in hardware or software for this TMM. There are some important points to note: To prevent oscillation (activating/deactivating SYN Cookie loop), the entry/exit strategy from SYN cookie mode have some hysteresis. This is determined by several factors like the round-trip time, rate of SYN arrival, the 'exit' threshold, the value of the SYN cache,… This will avoid entering and exiting SYN Cookie continuously during an attack if the number of TCP SYN packets per second for this attackis near to the configured SYN Cache threshold. From TMOS v13 you can stop to challenge a source IP during some seconds if the client response with a successfulchallenge response, that is, if client is trustworthy This is detailed in next article where configuration is treated. By checking diagram above you will notice that regardless if Hardware or Software SYN Cookie is enabled we see TMM always rejecting connections. This is because hardware only generates and validates SYN Cookies but it never drops connections. This is important when we check SYN Cookie stats as you will see in future article. Conclusion At this point you have a clear knowledge of SYN Cookie and SYN Cache countermeasures and how they are implemented in BIG-IP. In next articles we will start to talk about configurations and troubleshooting.1.5KViews1like0CommentsWhy does the Local Traffic policy allow Bot profile to be selected but the iRule can't ?
When I attach DOS and BOT profiles with local traffic policy or iRule I always need a default BOT and DOS profile even when I have a default rule that catches all the traffic. That is one thing but the strangest thing is when I decide to attach a Bot profile with iRule it does not work but the Local traffic policies allow this. I will need to test this but is really strange. This is the first time something is only possible with Local Traffic Policies but I will have to test if it works 🙂Solved1.2KViews0likes2CommentsSecuring APIs with BigIP
Introduction API servers respond to requests using the HTTP protocol, much like Web Servers. Therefore, API servers are susceptible to HTTP attacks in ways similar to Web Servers. Previous articles covered how to publish an API using the NGINX platform as an API management gateway. These APIs are still exposed to web attacks and defensive mechanisms are needed to defend the API against web attacks, denial of service, and Bots.The diagram below shows all the layers needed to deliver and defend APIs. BIGIP provides the protection and NGINX Plus provides API management. Picture 1. This article covers Advanced Web Application Firewall (AWAF) to protect against HTTP vulnerabilities Unified Bot Defense to protect against bots Behavioral Anomaly DoS Defense to prevent DoS attacks As shown in Picture 1, above BIG-IP goes in front of the API management gateway as an additional security gateway. The beauty of this approach is that BIG-IP can be initially deployed on the side while the API is being delivered to users through the NGINX Plus gateway directly. Once BIG-IP is configured to forward good requests to NGINX Plus and security policies are in place, BIG-IP can be brought into the traffic flow by simply changing the DNS records for "prod.httpbin.internet.lab" to point to BIG-IP instead of NGINX Plus. From this point on all calls will automatically arrive at BIG-IP for inspection and only those that pass all verifications will be forwarded to the next layer. Configuring Data Path Data path configuration for this use case is pretty common for BIG-IP which is historically a load balancer. It includes: Virtual Server (listens for API calls) SSL profile (defines SSL settings) SSL certificate and key (cryptographically identifies virtual server) Pool (destination for passed calls) Picture below shows how all of the configuration pieces work together. Picture 3. At first upload server certificate and key Setup SSL profile to use a certificate from the previous step Finally, create a virtual server and a pool to accept API calls and forward them to the backend From this point IG-IP accepts all requests which go to "prod.httpbin.secured-internet.lab" hostname and forwards them to API management gateway powered by NGINX Plus. Setting up WAF policy As you may already know every API starts from the OpenAPI file which describes all available endpoints, parameters, authentication methods, etc. This file contains all details related to API definition and it is widely used by most tools including F5's WAF for self-configuration. Imported OpenAPI file automatically configures policy with all API specific parameters as a list of allowed URLs, parameters, methods, and so on. Therefore WAF configuration narrows down to importing OpenAPI file and using policy template for API security. Create a policy Specify policy name, template, swagger file, virtual server and logging profile. API security template pre-configures WAF policy with all necessary violations and signatures to protect API backend. OpenAPI file introduces application-specific configuration to a policy as a list of allowed URLs, parameters, and methods. That is it. WAF policy is configured and assigned to the virtual server. Now we can test that only legitimate requests to allowed resources go through. For example request to URL which does not exist in the policy will be blocked: ubuntu@ip-10-1-1-7:~$ http -v https://prod.httpbin.secured-internet.lab/urldoesntexist GET /urldoesntexist HTTP/1.1 Accept: */* Accept-Encoding: gzip, deflate Connection: keep-alive Host: prod.httpbin.secured-internet.lab User-Agent: HTTPie/0.9.2 HTTP/1.1 403 Forbidden Cache-Control: no-cache Connection: close Content-Length: 38 Content-Type: application/json; charset=utf-8 Pragma: no-cache { "supportID": "1656927099224588298" } Request with SQL injection also blocked: Configuring Bot Defense Starting from BIG-IP release 14.1 proactive bot defense, web scraping, and bot detection features are combined under Bot Defense profile. Therefore current bot defense forms a unified tool to prevent all types of bots from accessing your web asset. Bot detection and mitigation mechanisms heavily rely on signatures and javascript (JS) based challenges. JS challenges run in a client browser and help to identify client type/malicious activities or apply mitigation by injecting a CAPTCHA or slowing down a client by making a browser perform a heavy calculation. Since this article is focused on protecting APIs it is important to note that JS challenges need to be used with caution in this case. Keep in mind that robots might be legitimate users for an API. However, robots similar to bots can not execute javascript. So when a robot receives JS it considers JS as API response. Such response does not align with what a robot expects and the application may break. If you know an API is serving automated processes avoid the use of JS-based challenges or test every JS-based feature in the staging environment first. Configuration to detect and handle many different types of Bots can be simplified by using any of the three pre-configured security modes: Relaxed (Challenge free, mitigates only 100% bad bots based on signatures ) Balanced (Let suspicious clients prove good behavior by executing JS challenges or solving CAPTCHA) Strict (Blocks all kinds of bots, verifies browsers, and collects device id from all clients) It is best to start with the relaxed template and tighten up the configuration as familiarity grows with the traffic that the API endpoints see. Once the profile is created assign it to the same virtual server at "Local Traffic ›› Virtual Servers : Virtual Server List ›› prod.httpbin.secured-internet.lab ›› Security" page. Such configuration performs bot detection based on data that is available in requests such as URL, user-agent, or header order. This mode is safe for all kinds of API users (browser-based or code-based robots) and you can see transaction outcome on "Security ›› Event Logs : Bot Defense : Bot Requests" page. If there are false positives you can adjust bot status or create a new trusted one for your robots through "Security ›› Bot Defense : Bot Signatures : Bot Signatures List". Setting Up DDoS Defense WAF and BOT defenses can detect requests with attack signatures or requests that are generated by malicious clients. However, attackers can send attacks composed of legitimate requests at a high scale, that can bring down an API endpoint. The following features present in the BIG-IP can be used to defeat Denial of Service attacks against API endpoints. Transaction per second (TPS) Based DoS Defense Stress Based DoS Defense Behavioral Anomaly Based DoS Defense Eviction Policies TPS-based DoS defense is the most straightforward protection mechanism. In this mode, BigIP measures requests rate for parameters such as Source IP, URL, Site, etc. In case the per minute rate becomes higher than the configured threshold then the attack gets triggered, and selected mitigation modes are applied to ‘all’ requests identified by the parameter. The stress-based mode works similarly to TPS, but instead of applying mitigation right after the threshold is crossed it only mitigates when the protected asset is under stress. This approach significantly reduces false positives. Behavioral anomaly detection (BADoS) mode offers the most advanced security and accuracy. This mode does not require the administrator to perform any configuration, other than turning the feature on. A machine learning algorithm is used to detect whether the protected asset is under attack or not. Another machine learning algorithm is used to baseline the traffic in peacetime. When the ‘attack detector algorithm’ identifies that the protected asset is under stress and non-responsive, then the second algorithm stops learning and looks for anomalies. Signatures matching these anomalies are automatically created. Anomalies discovered during attack time are likely nefarious and are eliminated from the traffic by application of dynamically discovered signatures. BADoS will automatically build a good traffic baseline, detect anomalies and stop them if the API endpoint is under stress. Conclusion F5 offers a multi-layered solution for protecting APIs, which is easy to configure.Please connect with me via comments and keep an eye on more articles in this series. Good luck!3.6KViews2likes2Comments