VIPRION external monitor

Problem this snippet solves:

This VIPRION specific external monitor script is written in bash and utilizes TMSH to extend the built-in monitoring functionality of BIG-IP version 10.2.3. This write-up assumes the reader has working knowledge writing BIG-IP LTM external monitors. The following link is a great starting point LTM External Monitors - The Basics

Logical network diagram:

NOTE: The monitor is written to meet very specific environmental requirements. Therefore, your implementation may vary greatly. This post is inteded to show you some requirements for writing external monitors on the VIPRION platform while offering some creative ways to extend the functionality of external monitors using TMSH.

The VIPRION acts as a hop in the default path of traffic destined for the Internet. Specific application flows are vectored to optimization servers and all other traffic is passed to the next hop router (Router C) toward the Internet. Router A and Router C are BGP neighbors through the VIPRION. Router B is a BGP neighbor with the VIPRION via ZebOS. A virtual address has route health injection enabled. The script monitors a user defined (agrument to the script) pool and transitions into the failed state when the available pool member count drops below a threshold value (argument to the script).

In the failed state the following actions are performed once, effectively stopping client traffic flow through the VIPRION.

Two virtual servers (arguments to the script) are disable to stop traffic through VIPRION.
A virtual address (argument to the script) is disabled to disable route health injection of the address.
All non Self-IP BGP connections are found in the connection table and deleted.

NOTE: Manual intervention is required to enable virtual servers and virtual address when the monitor transitions from failed state to successful state before normal traffic flows will proceed.

How to use this snippet:

The monitor definition:

monitor eavbgpv3 { defaults from external interval 20 timeout 61 args "poolhttp 32 vsforward1 vsforward2 10.10.10.1"v DEBUG "0"v run "rhi_v3.bsh" }

This external monitor is configured to check for available members in the pool "poolhttp". When the available members falls below 32 the monitor transistions into the failed state and disables the virtual servers "vsforward1" and "vs_forward2" and disables the virtual address "10.10.10.1". When the available pool members increases above 32 neither the virtuals servers nor the virtual address is enabled. This will require manual intervention. The external monitor is assigned to a phantom pool with a single member "1.1.1.1:4353". No traffic is sent to the pool member. This pool and pool member are in place so the operator can see the current status of the external monitor.

The Pool definition:

pool bgpmonitor { monitor all eavbgp_v3 members 1.1.1.1:f5-iquery {} }

You can download the script here: rhi_v3.bsh

Code :

#!/bin/bash
# (c) Copyright 1996-2007 F5 Networks, Inc.
#
# This software is confidential and may contain trade secrets that are the
# property of F5 Networks, Inc.  No part of the software may be disclosed
# to other parties without the express written consent of F5 Networks, Inc.
# It is against the law to copy the software.  No part of the software may
# be reproduced, transmitted, or distributed in any form or by any means,
# electronic or mechanical, including photocopying, recording, or information
# storage and retrieval systems, for any purpose without the express written
# permission of F5 Networks, Inc.  Our services are only available for legal
# users of the program, for instance in the event that we extend our services
# by offering the updating of files via the Internet.
#
#  author: Paul DeHerrera pauld@f5.com
#
# these arguments supplied automatically for all external monitors:
# $1 = IP (nnn.nnn.nnn.nnn notation or hostname)
# $2 = port (decimal, host byte order) -- not used in this monitor, assumes default port 53
#
# these arguments must be supplied in the monitor configuration:
# $3 = name of pool to monitor
# $4 = threshold value of the pool.  If the available pool member count drops below this value the monitor will respond in 'failed' state
# $5 = first Virtual server to disable
# $6 = second Virtual server to disable
# $7 = first Virtual address to disable
# $8 = second Virtual address to disable
### Check for the 'DEBUG' variable, set it here if not present.

# is the DEBUG variable passed as a variable?
if [ -z "$DEBUG" ]
then
   # If the monitor config didn't specify debug as a variable then enable/disable it here
   DEBUG=0
fi

### If Debug is on, output the script start time to /var/log/ltm

# capture and log (when debug is on) a timestamp when this eav starts
export ST=`date +%Y%m%d-%H:%M:%S`
if [ $DEBUG -eq 1 ]; then echo "EAV `basename $0` ($$): started at $ST" | logger -p local0.debug; fi

### Do not execute this script within the first 300 seconds after BIG-IP boot.  This is a customer specific requirement

# this section is used to introduce a delay of 300 seconds after system boot before executing this eav for the first time
BOOT_DATE=`who -b | grep -i 'system boot' | awk {'print $3 " " $4 " " $5'}`
if [ $DEBUG -eq 1 ]; then echo "EAV `basename $0` ($$): boot_date: ($BOOT_DATE)" | logger -p local0.debug; fi
EPOCH_DATE=`date -d "$BOOT_DATE" +%s`
if [ $DEBUG -eq 1 ]; then echo "EAV `basename $0` ($$): epoch_date: ($EPOCH_DATE)" | logger -p local0.debug; fi
EPOCH_DATE=$((${EPOCH_DATE}+300))
if [ $DEBUG -eq 1 ]; then echo "EAV `basename $0` ($$): epoch_date +300: ($EPOCH_DATE)" | logger -p local0.debug; fi
CUR_DATE=`date +%s`
if [ $DEBUG -eq 1 ]; then echo "EAV `basename $0` ($$): current_date: ($CUR_DATE)" | logger -p local0.debug; fi

if [ $CUR_DATE -ge $EPOCH_DATE ]
  then

### Assign a value to variables.  The VIPRION requires some commands to be executed on the Primary slot as you will see later in this script

# export some variables
if [ $DEBUG -eq 1 ]; then echo "EAV `basename $0` ($$): exporting variables..." | logger -p local0.debug; fi
export REMOTEUSER="root"
export HOME="/root"
export IP=`echo $1 | sed 's/::ffff://'`
export PORT=$2
export POOL=$3
export MEMBER_THRESHOLD=$4
export VIRTUAL_SERVER1=$5
export VIRTUAL_SERVER2=$6
export VIRTUAL_ADDRESS1=$7
export VIRTUAL_ADDRESS2=$8
export PIDFILE="/var/run/`basename $0`.$IP.$PORT.pid"
export TRACKING_FILENAME=/var/tmp/rhi_bsh_monitor_status
export PRIMARY_SLOT=`tmsh list sys db cluster.primary.slot | grep -i 'value' | sed -e 's/\"//g' | awk {'print $NF'}`

### Output the Primary slot to /var/log/ltm

if [ $DEBUG -eq 1 ]; then echo "EAV `basename $0` ($$): the primary blade is in slot number: ($PRIMARY_SLOT)..." | logger -p local0.debug; fi

### This section is for debugging only.  Check to see if this script is executing on the Primary blade and output to /var/log/ltm

if [ $DEBUG -eq 1 ]; then export PRIMARY_BLADE=`tmsh list sys db cluster.primary | grep -i "value" | sed -e 's/\"//g' | awk {'print $NF'}`; fi
if [ $DEBUG -eq 1 ]; then echo "EAV `basename $0` ($$): is this monitor executing on the primary blade: ($PRIMARY_BLADE)" | logger -p local0.debug; fi

### Standard EAV check to see if an instance of this script is already running for the memeber.  If so, kill the previous instance and output to /var/log/ltm

# is there already an instance of this EAV running for this member?
if [ -f $PIDFILE ]
then
   if [ $DEBUG -eq 1 ]; then echo "EAV `basename $0` ($$): pid file is present, killing process..." | logger -p local0.debug; fi
   kill -9 `cat $PIDFILE` > /dev/null 2>&1
   echo "EAV `basename $0` ($$): exceeded monitor interval, needed to kill ${IP}:${PORT} with PID `cat $PIDFILE`" | logger -p local0.error
fi

### Create a new pid file to track this instance of the monitor for the current member

# create a pidfile
if [ $DEBUG -eq 1 ]; then echo "EAV `basename $0` ($$): creating new pid file..." | logger -p local0.debug; fi
echo "$$" > $PIDFILE

### Export variables for available pool members and total pool members

# export more variables (these require tmsh)
export AVAILABLE=`tmsh show /ltm pool $POOL members all-properties | grep -i "Availability" | awk {'print $NF'} | grep -ic "available"`
export TOTAL_POOL_MEMBERS=`tmsh show /ltm pool $POOL members all-properties | grep -c "Pool Member"`
let "AVAILABLE-=1"

### If Debug is on, output some variables to /var/log/ltm - helps with troubleshooting

if [ $DEBUG -eq 1 ]; then echo "EAV `basename $0` ($$): Pool ($POOL) has ($AVAILABLE) available of ($TOTAL_POOL_MEMBERS) total members." | logger -p local0.debug; fi
if [ $DEBUG -eq 1 ]; then echo "EAV `basename $0` ($$): Pool ($POOL) threshold = ($MEMBER_THRESHOLD) members.  Virtual server1 ($VIRTUAL_SERVER1) and Virtual server2 ($VIRTUAL_SERVER2)" | logger -p local0.debug; fi
if [ $DEBUG -eq 1 ]; then echo "EAV `basename $0` ($$): Member Threshold ($MEMBER_THRESHOLD)" | logger -p local0.debug; fi

### If the available members is less than the threshold then we are in a 'failed' state.

# main monitor logic
if [ "$AVAILABLE" -lt "$MEMBER_THRESHOLD" ]
then

### If Debug is on, output status to /var/log/ltm 

   ### notify log - below threshold and disabling virtual server1
   if [ $DEBUG -eq 1 ]; then echo "EAV `basename $0` ($$): AVAILABLE < MEMBER_THRESHOLD, disabling the virtual server..." | logger -p local0.debug; fi

   if [ $DEBUG -eq 1 ]; then echo "EAV `basename $0` ($$): disabling Virtual Server 1 ($VIRTUAL_SERVER1)" | logger -p local0.debug; fi

### Disable the first virtual server, which may exist in an administrative partition.  For version 10.2.3 (possibly others) the script is required to change the 'update-partition' before disabling the virtual server.  To accomplish this we first determine the administrative partition name where the virtual is configured then we build a list construct to execute both commands consecutively.  

   ### disable virtual server 1

### obtain the administrative partition for the virtual.  if no administrative partition is found, assume common
   export VS1_PART=`tmsh list ltm virtual $VIRTUAL_SERVER1 | grep 'partition' | awk {'print $NF'}`
   if [ -z ${VS1_PART} ]; then
### no administrative partition was found so execute a list construct to change the update-partition to Common and disable the virtual server consecutively
      export DISABLE1=`ssh -o StrictHostKeyChecking=no root\@slot$PRIMARY_SLOT "tmsh modify cli admin-partitions update-partition Common && tmsh modify /ltm virtual $VIRTUAL_SERVER1 disabled"`
### If Debug is on, output the command to /var/log/ltm
      if [ $DEBUG -eq 1 ]; then echo "EAV `basename $0` ($$): disable cmd1: ssh -o StrictHostKeyChecking=no root\@slot$PRIMARY_SLOT 'tmsh modify cli admin-partitions update-partition Common && tmsh modify /ltm virtual $VIRTUAL_SERVER1 disabled'" | logger -p local0.debug; fi
   else
### the administrative partition was found so execute a list construct to change the update-partition  and disable the virtual server consecutively.  The command is sent to the primary slot via SSH
      export DISABLE1=`ssh -o StrictHostKeyChecking=no root\@slot$PRIMARY_SLOT "tmsh modify cli admin-partitions update-partition $VS1_PART && tmsh modify /ltm virtual $VIRTUAL_SERVER1 disabled"`
### If Debug is on, output the command to /var/log/ltm
      if [ $DEBUG -eq 1 ]; then echo "EAV `basename $0` ($$): disable cmd1: ssh -o StrictHostKeyChecking=no root\@slot$PRIMARY_SLOT 'tmsh modify cli admin-partitions update-partition $VS1_PART && tmsh modify /ltm virtual $VIRTUAL_SERVER1 disabled'" | logger -p local0.debug; fi
   fi

### If Debug is on, output status to /var/log/ltm 

   if [ $DEBUG -eq 1 ]; then echo "EAV `basename $0` ($$): disabling Virtual Server 2 ($VIRTUAL_SERVER2)" | logger -p local0.debug; fi

### Disable the second virtual server.  This section is the same as above, so I will skip the detailed comments here.

   ### disable virtual server 2
   export VS2_PART=`tmsh list ltm virtual $VIRTUAL_SERVER2 | grep 'partition' | awk {'print $NF'}`
   if [ -z ${VS2_PART} ]; then
      export DISABLE2=`ssh -o StrictHostKeyChecking=no root\@slot$PRIMARY_SLOT "tmsh modify cli admin-partitions update-partition Common && tmsh modify /ltm virtual $VIRTUAL_SERVER2 disabled"`
      if [ $DEBUG -eq 1 ]; then echo "EAV `basename $0` ($$): disable cmd2: ssh -o StrictHostKeyChecking=no root\@slot$PRIMARY_SLOT 'tmsh modify cli admin-partitions update-partition Common && tmsh modify /ltm virtual $VIRTUAL_SERVER2 disabled'" | logger -p local0.debug; fi
   else
      export DISABLE2=`ssh -o StrictHostKeyChecking=no root\@slot$PRIMARY_SLOT "tmsh modify cli admin-partitions update-partition $VS2_PART && tmsh modify /ltm virtual $VIRTUAL_SERVER2 disabled"`
      if [ $DEBUG -eq 1 ]; then echo "EAV `basename $0` ($$): disable cmd2: ssh -o StrictHostKeyChecking=no root\@slot$PRIMARY_SLOT 'tmsh modify cli admin-partitions update-partition $VS2_PART && tmsh modify ltm virtual $VIRTUAL_SERVER2 disabled'" | logger -p local0.debug; fi
   fi

   ### notify log - disconnecting all BGP connection
   if [ $DEBUG -eq 1 ]; then echo "EAV `basename $0` ($$): Pool ($POOL) disconnecting all BGP connections..." | logger -p local0.debug; fi

   ### acquire a list of self IPs
   SELF_IPS=(`tmsh list net self | grep 'net self' | sed -e 's/\//\ /g' | awk {'print $3'}`)
   ### start to build our TMSH command excluding self IPs
   BGP_CONNS="tmsh show sys conn cs-server-port 179 | sed -e 's/\:/\ /g' | egrep -v '"
   COUNT=1
   if [ $DEBUG -eq 1 ]; then echo "EAV `basename $0` ($$): BGP Step 1 - ${BGP_CONNS}" | logger -p local0.debug; fi

   ### loop through the self IPs
   for ip in "${SELF_IPS[@]}"
   do
      if [ $DEBUG -eq 1 ]; then echo "EAV `basename $0` ($$): BGP Step 2 - ${ip}" | logger -p local0.debug; fi

      ### continue to build our TMSH command - append self IPs to ignore
      if [ ${COUNT} -gt 1 ]
      then
         BGP_CONNS=${BGP_CONNS}"|${ip}"
      else
         BGP_CONNS=${BGP_CONNS}"${ip}"
      fi
      (( COUNT++ ))
   done

   ### if debug is on log a message with the TMSH command up until this point
   if [ $DEBUG -eq 1 ]; then echo "EAV `basename $0` ($$): BGP Step 3 - ${BGP_CONNS}" | logger -p local0.debug; fi
   ### finish the TMSH command to show BGP connections not including self IPs
   BGP_CONNS=${BGP_CONNS}"' | egrep -v 'Sys|Total' | awk {'print \$1'}"
   if [ $DEBUG -eq 1 ]; then echo "EAV `basename $0` ($$): BGP Step 4 - ${BGP_CONNS}" | logger -p local0.debug; fi
   ### gather all BGP connection not including those to self IPs
   DISCONNS=(`eval $BGP_CONNS`)
   DISCMD=''
   NEWCOUNT=1
   if [ $DEBUG -eq 1 ]; then echo "EAV `basename $0` ($$): BGP Step 5 - ${DISCONNS}" | logger -p local0.debug; fi
   ### loop through the resulting BGP connections and build another TMSH command to delete these connections from the connection table
   for newip in "${DISCONNS[@]}"
   do
      if [ $DEBUG -eq 1 ]; then echo "EAV `basename $0` ($$): BGP Step 6" | logger -p local0.debug; fi
      if [ ${NEWCOUNT} -gt 1 ]
      then
         DISCMD=${DISCMD}" && tmsh delete sys connection cs-client-addr ${newip} cs-server-port 179"
      else
         DISCMD=${DISCMD}"tmsh delete sys connection cs-client-addr ${newip} cs-server-port 179"
      fi
      (( NEWCOUNT++ ))
   done
   ### if debug is on log the command we just assembled
   if [ $DEBUG -eq 1 ]; then echo "EAV `basename $0` ($$): BGP Step 7 - ${DISCMD}" | logger -p local0.debug; fi
   ###  One the primary slot execute the command to delete the non self IP BGP connections.
   export CONNECTIONS=`ssh -o StrictHostKeyChecking=no root\@slot$PRIMARY_SLOT "${DISCMD}"`
   if [ $DEBUG -eq 1 ]; then echo "EAV `basename $0` ($$): BGP Step 8 - $CONNECTIONS" | logger -p local0.debug; fi
   ### disable virtual address 1
   if [ $DEBUG -eq 1 ]; then echo "EAV `basename $0` ($$): VA1 ($VIRTUAL_ADDRESS1)" | logger -p local0.debug; fi
   if [ ! -z "$VIRTUAL_ADDRESS1" ]; then
      if [ $DEBUG -eq 1 ]; then echo "EAV `basename $0` ($$): disabling Virtual Address 1 ($VIRTUAL_ADDRESS1)" | logger -p local0.debug; fi
      export VA1_PART=`tmsh list ltm virtual-address $VIRTUAL_ADDRESS1 | grep 'partition' | awk {'print $NF'}`
      if [ $DEBUG -eq 1 ]; then echo "EAV `basename $0` ($$): cmd: ssh -o StrictHostKeyChecking=no root\@slot$PRIMARY_SLOT tmsh modify cli admin-partitions update-partition $VA1_PART && tmsh modify /ltm virtual-address $VIRTUAL_ADDRESS1 enabled no " | logger -p local0.debug; fi
      export VA2_UPCMD=`ssh -o StrictHostKeyChecking=no root\@slot$PRIMARY_SLOT "tmsh modify cli admin-partitions update-partition $VA1_PART && tmsh modify /ltm virtual-address $VIRTUAL_ADDRESS1 enabled no"`
      if [ $DEBUG -eq 1 ]; then echo "EAV `basename $0` ($$): virtual address 1 disabled?" | logger -p local0.debug; fi
   fi
   ### disable virtual address 2
   if [ ! -z "$VIRTUAL_ADDRESS2" ]; then
      if [ $DEBUG -eq 1 ]; then echo "EAV `basename $0` ($$): disabling Virtual Address 2 ($VIRTUAL_ADDRESS2)" | logger -p local0.debug; fi
      export VA2_PART=`tmsh list ltm virtual-address $VIRTUAL_ADDRESS2 | grep 'partition' | awk {'print $NF'}`
      if [ $DEBUG -eq 1 ]; then echo "EAV `basename $0` ($$): update-partition - $VA2_PART" | logger -p local0.debug; fi
      export VA2_UPCMD=`ssh -o StrictHostKeyChecking=no root\@slot$PRIMARY_SLOT "tmsh modify cli admin-partitions update-partition $VA2_PART && tmsh modify /ltm virtual-address $VIRTUAL_ADDRESS2 enabled no"`
      if [ $DEBUG -eq 1 ]; then echo "EAV `basename $0` ($$): cmd: virtual address 2 disabled?" | logger -p local0.debug; fi
   fi
   ### track number of times this monitor has failed
   if [ -e "$TRACKING_FILENAME" ]
   then
      export COUNT=`cat $TRACKING_FILENAME`
      export NEW_COUNT=$((${COUNT}+1))
      echo $NEW_COUNT > $TRACKING_FILENAME
   else
      echo 1 > $TRACKING_FILENAME
      export NEW_COUNT=1
   fi
   ### notify log - failure count
   echo "EAV `basename $0` ($$): Pool $POOL only has $AVAILABLE available of $TOTAL_POOL_MEMBERS total members, failing site.  Virtual servers ($VIRTUAL_SERVER1 and $VIRTUAL_SERVER2) will be disabled and all connections with destination port 179 will be terminated.  Virtual servers must be manually enabled after pool $MEMBER_THRESHOLD or more pool members are available.  This monitor has failed $NEW_COUNT times." | logger -p local0.debug

   # remove the pidfile
   if [ $DEBUG -eq 1 ]; then echo "EAV `basename $0` ($$): removing the pidfile..." | logger -p local0.debug; fi
   export PIDBGONE=`rm -f $PIDFILE`
   if [ $DEBUG -eq 1 ]; then echo "EAV `basename $0` ($$): pidfile has been removed ($PIDBGONE)" | logger -p local0.debug; fi
   export END=`date +%Y%m%d-%H:%M:%S`
   if [ $DEBUG -eq 1 ]; then echo "EAV `basename $0` ($$): stopped at $END" | logger -p local0.debug; fi
else
   if [ -e "$TRACKING_FILENAME" ]
      then
      ### log the status
      echo "EAV `basename $0` ($$):  Pool $POOL has $AVAILABLE members of $TOTAL_POOL_MEMBERS total members.  No change to virtual servers ($VIRTUAL_SERVER1 and $VIRTUAL_SERVER2).  No change to port 179 connections.  Virtual servers must be manually enabled to pass traffic if they are disabled." | logger -p local0.debug
      rm -f $TRACKING_FILENAME
   fi
   ### remove the pidfile
   if [ $DEBUG -eq 1 ]; then echo "EAV `basename $0` ($$): removing the pidfile..." | logger -p local0.debug; fi
   export PIDBGONE=`rm -f $PIDFILE`
   if [ $DEBUG -eq 1 ]; then echo "EAV `basename $0` ($$): pidfile has been removed ($PIDBGONE)" | logger -p local0.debug; fi
   export END=`date +%Y%m%d-%H:%M:%S`
   if [ $DEBUG -eq 1 ]; then echo "EAV `basename $0` ($$): stopped at $END" | logger -p local0.debug; fi
   echo "UP"
fi

fi

Published Mar 12, 2015

Version 1.0

bash

monitor

viprion