After some frustrating experiences, I found that you cannot run tcpdump out of the alertd execution context - SELinux gets in the way and prevents access to the network devices.
And yes - it does work in some circumstances, but not reliably for all releases/platforms/situations.
I had to build out a hardware/version compatible repro to demonstrate and solve this problem when I first ran into it.
I solved it like this:
Have a startup script that creates a named pipe, and waits on the named pipe to run the tcpdump
This is running in the root context and has permission to run tcpdump.
/config/startup/monitor_down_dump.sh
#!/bin/bash
NP=/var/run/monitor_down_tcpdump.pipe
if [ -e $NP ]; then
echo "$NP already exists; is this script already running?"
exit 1
fi
mkfifo $NP
read x < $NP
/bin/rm $NP
logger -p local0.info "$x"
# start a tcpdump
# THIS count VALUE MAY NEED TESTING AND TUNING
-nni 0.0:nnn -s0 -w /var/tmp/`uname -n`_`date +%F_%H:%M`.pcap
You also need a trigger script run from your user_alert that pushes data into the named pipe.
This runs in the alertd context and does not have permission to run tcpdump, but can push a message down the named pipe.
/shared/monitor_down_trigger.sh
#!/bin/bash
NP=/var/run/monitor_down_tcpdump.pipe
echo "debug_triggered" > $NP
and your user_alert.conf snippet
alert endb_mon_down "01070638:5: Pool /Common/pool_one member /Common/10.1.62.61:0 monitor status down." {
exec command="/shared/monitor_down_trigger.sh";
}
For my implementation, the customer also had a cron task that checked to see if the script was still running every 10 minutes, and restarted it if it had triggered or stopped. This may or may not be required.