Skip to content

Releases: euro-ix/IXP-Watch

Release 1.17

24 Jul 10:03
Compare
Choose a tag to compare
  • ixp-watch-tidy: get config options from config file.
  • ixp-watch-tidy: Option to purge reports as well as samples.
  • install.sh: prompt for purge settings.
  • Configurable report file extension. .TXT by default.

Full Changelog: release-1.16a...release-1.17

Release 1.16a

10 May 11:47
Compare
Choose a tag to compare
  • Bugfix: Do not try to copy the report via slack on arp alert, only email.
  • Bugfix: RRD Graph PNGs not updated.

Full Changelog: 1.16...1.16a

Release 1.16

15 Dec 22:24
Compare
Choose a tag to compare
  • 1.16 2022-12-14 - robl
    • Move things to functions to avoid repeated code in many places.
    • Error handling / alerting is now more consistent
    • do_alert: Handle all alerts/emails/log output in function
    • log_error: Handle all error output logs in function
    • check_errors: Handle checking for log_error logs
    • die/cleanup/cleanexit: Clean up on exit, trap CTRL-C
    • required_vars: Check all variables required for proper operation are set before continuing
    • create_report/report/report_header/report_section_file - For building reports
    • do_graph: create and update rrd graphs
    • do_alert: Add slack alerter
    • Make path to logger prog config variable.
    • Logging to multiple syslog hosts.
    • Configure syslog options in install script
    • Better error handling if called programs return non-zero
    • Check we can execute tshark before starting capture
    • Log stderror output from tshark capture if it fails to run correctly
    • Replace backticks with newer $(command) syntax

CONFIG CHANGES

  • For existing configs >= 1.14, you will need to change the following in your config.sh
  1. Add new variable to logger program e.g.: LOGGER=/usr/bin/logger
  2. Change LOGHOST to LOGHOSTS. Multiple syslog servers can be configured: LOGHOSTS="192.168.100.10 192.168.200.20"
  3. BGPOPENS_OLD_FORMAT=0 changed to BGPOPENS_OLD_FORMAT=1 by default. (It can be quite intensive on DNS lookups when processing large samples, so decent DNS cache etc. is required.
  4. URL_ALERTS has been added but is not yet used. The idea is to add a link to ixp-watch web pages to email/slack alerts for convenience.
  • If you are using a version older than 1.14, then I recommend just moving your existing /usr/local/ixpwatch directory (or wherever you installed it) out of the way, and using the install script to install a fresh copy, then check the config variables etc. in your old ixp-watch script are in the new config.sh

NOTES

  • Quite a few changes and improvements in this version, mostly to make ixp-watch alert and report handling easier, as well as improved error checking and sanity checking.
  • The Slack alerter option is still a bit experimental and may change. For this reason, I have not (yet) included the slack_alerter script that we use. (But if you would like it, please let me know!)

DESCRIPTION OF NEW FUNCTIONS

1. Alerting/Error logging:

Previously, every time ixp-watch needed to send some alert (email, pager, local syslog, remote_syslog) the commands would be called when required every time, e.g.:

echo "WARNING: Something bad just happened!: " >> $TEMP_DIR/err.$$
/some/command/output/here >> $TEMP_DIR/err.$$
[...]

if [ -f "$TEMP_DIR/err.$$" ] ; then
     $MAILPROG -s "[$NETWORK] Something bad happened! " $ALARM_EMAIL < $TEMP_DIR/err.$$
     rm $TEMP_DIR/err.$$
fi
  • Similar "create some output file" and then "email and/or page, or maybe syslog something" routines were throughout the script.
  • Adding the Slack alerting only made this worse. Now, in several places, we end up with some or all of:
# Mail alert
$MAILPROG -s "[$NETWORK] some warning here" $ALARM_EMAIL < $TEMP_DIR/err.$$

  if [ -n "$ALARM_PAGER" ] ; then
    $MAILPROG -s "[$NETWORK] Spanning Tree Alarm - Please investigate" $ALARM_PAGER < $TEMP_DIR/err.$$
  fi

if [ -n "$LOCAL_FACIL" ] ; then
  cat $TEMP_DIR/$TEMP_DIR/err.$$ | /usr/bin/logger -p $LOCAL_FACIL -t ixp-watch
 fi

if [ -n "$LOGHOST" ] ; then
  cat $TEMP_DIR/alarms.tmp | /usr/bin/logger -p $LOG_FACILITY -n $LOGHOST -t ixp-watch
 fi

echo "Some warning here" | $SLACK_ALERTER $SLACK_ALERTER_OPTS

Most alert routines only did one or two of these alert commands. (Maybe this made sense back in 2002!). But what if we want to send alerts only to syslog, or just log everything to both syslog and email alerts? Or maybe no email, but send everything only to slack...? Ugh. Now it starts to get messy!

To solve this, all the alerting has been consolidated into functions: do_alert and log_error and check_errors

if [ ! -f $SAMPLEDIR/$FILEDATE ] ; then
 do_alert FATAL "Error: sample file $SAMPLEDIR/$FILEDATE could not be opened."
fi

do_alert then decides what to do with the alert, and sends it out using the configured methods. (In this case, FATAL means "send the alert(s) and then clean up and exit." (Something bad happened which means we have to bail out.)

Some things generate output to a file which is then sent (via email/syslog etc.) along with the alert:

  do_alert WARNING "Non-IP Traffic Report ($FILEDATE)" $TEMP_DIR/alarms.tmp

Some things generate several lines of output, perhaps adding further command output to the alert. These are now passed to log_error, and when ready, call check_errors to handle anything collected by log_error, (which in turn will call do_alert to send appropriate alerts.)

  if [ $DISK_PERCENT -ge $DISK_PERCENT_MAX ] ; then
    # Delete samples > 1h old bigger than 20M:
    deleted_samples=$(find $SAMPLE_ROOT -size +20M -cmin +60)
    log_error WARNING "Disk space still low! Deleted samples"
    log_error WARNING "$deleted_samples"
    find $SAMPLE_ROOT -size +20M -cmin +60 -exec rm {} \;
  fi

 check_errors
 

check_errors checks for anything captured with log_error at WARNING or FATAL. log_error then calls do_alert to send an alert, along with the output file. If the error is WARNING, then the script continues. If errors have been logged at FATAL, then the script will send all FATAL output and WARNINGs, clean up and exit.

Sometimes we need a separate, critical alert about something which will also send a pager message if called with CRITICAL:

  /some/output/script >> $LOG_ROOT/alarm_output.$$   
  do_alert CRITICAL "Woah! Something really bad just happened!" $LOG_ROOT/alarm_output.$$

(Optional "alert" output file to add - note that do_alert deletes the file once the alerts have been sent by all configured methods.)

This makes the alerting much simpler in the rest of the script.

Error trapping. Very occasionally, something would go wrong which would not be caught by the script - perhaps the capture tshark exited with some error. Maybe it would silently exit. Now, we try a bit harder: trap any error output and send it in an alert. (And more importantly, do not carry on trying to run.) Previously, an alert would be sent if the expected sample file did not exist, but this only contains sample file $SAMPLEDIR/$FILEDATE could not be opened. - better than nothing, but no help to work out what went wrong!

$TSHARK -q -i $CAP_INTERFACE -a duration:$SAMPLE_TIME -w $SAMPLEDIR/$FILEDATE -f "not host $MY_IP" 2>${TEMP_DIR}/tshark_output.$$
if [ $? != 0 ] ; then
  errors=$(cat ${TEMP_DIR}/tshark_output.$$)
  log_error FATAL "TSHARK cmd: $TSHARK -q -i $CAP_INTERFACE -a duration:$SAMPLE_TIME -w $SAMPLEDIR/$FILEDATE -f not host $MY_IP"
  log_error FATAL "TSHARK returned: $?"
  log_error FATAL "$errors"
  check_errors FATAL "Sample capture returned error(s): $TSHARK"
fi

2. Reporting

Previously, the report for each sample was generated like:

echo "--------------------------------------------------------------------------------" > $LOGDIR/$FILEDATE.TXT
echo "$NETWORK LAN Traffic Summary Report - $ISODATE" >>$LOGDIR/$FILEDATE.TXT
echo "--------------------------------------------------------------------------------">>$LOGDIR/$FILEDATE.TXT
echo "Analysis based on a sample of $NUM_MINUTES minutes.">>$LOGDIR/$FILEDATE.TXT
echo "Started at $STARTDATE, ended at $STOPDATE">>$LOGDIR/$FILEDATE.TXT
echo "The entire session is saved in:  $SAMPLEDIR/$FILEDATE.gz">>$LOGDIR/$FILEDATE.TXT
echo "--------------------------------------------------------------------------------">>$LOGDIR/$FILEDATE.TXT
echo "ARP Queries......: $NUM_ARP">>$LOGDIR/$FILEDATE.TXT
echo "ARPs per minute..: $NUM_ARPS_MIN">>$LOGDIR/$FILEDATE.TXT
echo "IP Packets.......: $NUM_IP">>$LOGDIR/$FILEDATE.TXT
echo "IP6 Packets......: $NUM_IP6">>$LOGDIR/$FILEDATE.TXT
echo "ICMP Packets.....: $NUM_ICMP">>$LOGDIR/$FILEDATE.TXT
echo "ICMPv6 Packets...: $NUM_ICMP6">>$LOGDIR/$FILEDATE.TXT
echo "NON-IP Packets...: $NUM_NOTIP">>$LOGDIR/$FILEDATE.TXT
echo "ARPs Sponged.....: $NUM_SPONGE_REPLY">>$LOGDIR/$FILEDATE.TXT
echo "Dead BGP Peers...: $NUM_BGP">>$LOGDIR/$FILEDATE.TXT
[...]
echo "--------------------------------------------------------------------------------">>$LOGDIR/$FILEDATE.TXT
echo ":::::: TOP 30 ARPERS">>$LOGDIR/$FILEDATE.TXT
echo "--------------------------------------------------------------------------------">>$LOGDIR/$FILEDATE.TXT
cat $LOG_ROOT/TOP_ARPERS.LOG>>$LOGDIR/$FILEDATE.TXT
[...]

This is just a lot of repetition of things. Now it's just:

# Create report
create_report $LOGDIR/$FILEDATE.TXT "$NETWORK LAN Traffic Summary Report - $ISODATE"
report "Analysis based on a sample of $NUM_MINUTES minutes."
report "Started at $STARTDATE, ended at $STOPDATE"
report "The entire session is saved in:  $SAMPLEDIR/$FILEDATE.gz"
report _LINE_
report "ARP Queries......: $NUM_ARP"
report "ARPs per minute..: $NUM_ARPS_MIN"
report "IP Packets.......: $NUM_IP"
report "IP6 Packets......: $NUM_IP6"
report "ICMP Packets.....: $NUM_ICMP"
report "ICMPv6 Packets...: $NUM_ICMP6"
report "NON-IP Packets...: $NUM_NOTIP"
report "ARPs Sponged.....: $NUM_SPONGE_REPLY"
report "Dead BGP Peers...: $NUM_BGP"
report_section_file "TOP 30 ARPERS" $LOG_ROOT/TOP_ARPERS.LOG

[...]

# If you want to send full reports to e-mail:
if [ -n "$REPORT_EMAIL" ] ; then
 $MAILPROG -s "[$NETWORK] Traffic Summary Report" $REPORT_EMAIL< $LOGDIR/$FILEDATE.TXT
fi

To generate the same output. The idea is to eventually make it more flexible and easier to change the forma...

Read more

Release 1.15.1

09 Jul 15:36
Compare
Choose a tag to compare
  • 1.15.1 2021-07-06 - robl
    • Bugfix: Exit after sample file size exceeds MAX_SAMPLE_SIZE.
    • Move more options to config file
    • Make update_ethers tool use CONFIG.
    • Fix auto_sponge/sponge to use -c without breaking existing cmd options

Release 1.15

25 Jun 14:39
Compare
Choose a tag to compare
  • 1.15 2021-06-24 - robl
    • Move example config file location and default location to avoid clash when installing via automation/git.
    • Set default config location to /etc/ixpwatch
    • Make ALARM_PAGER optional.
    • [Issue #4] Move COUNTSFILE to config, reorder config vars so it works correctly.
    • Simple install script (works with debian/ubuntu to install from repository) See INSTALL.TXT
    • Make sponge utility use CONFIG file.
    • Make ixp-watch-tidy utility use CONFIG file.
    • fixes to install script

Release 1.14

23 Jun 11:15
Compare
Choose a tag to compare

NOTE: This release moves the user configuration variables into a separate config file. If upgrading from an older version,
you will need to update this file (config.sh) with your settings from the old script. Edit the ixp-watch script CONFIG= variable to
point to the correct file.
This is to make testing and upgrades easier in future, as the script can be now be replaced without having
to redo all the config vars.

  • IPv6 is a thing now. Removed option to disable IPv6 processing.
  • Add report and alert for IPv6 Router Advertisements.
  • Fixes [Issue #3]: Use better tests -z and -n instead of == operator
    and "" to test empty values. (This was throwing an error in some
    environments if POSIX-incompatible shell is used)
  • Make the dead BGP sessions output more useful by resolving IPs. To use old format instead, set BGPOPENS_OLD_FORMAT=1
  • Move user config variables to a separate file (config.sh) to make upgrades easier.
  • Config file can now be specified with "-c filename" to make running multiple instances for multiple LANs easier.

Release 1.13

02 Mar 10:59
Compare
Choose a tag to compare
  • Run disk space checks before starting sample capture, not after.
    This means it works if the disk is already full. It will now be run even
    if capture fails or the script exits for some reason. It might also free
    up enough disk space for the capture to succeed, rather than the capture
    failing and the script exiting before the disk space check/purge can run.

  • Make disk space check optional. (Don't do the check if DISK_PERCENT_PROG is
    undefined (commented out) or empty.)

  • Bugfix: [Issue #2]: Correct non-unicast filter to use >= 224.0.0.0 and
    not >= 223.0.0.0

Release 1.12

09 Oct 12:42
Compare
Choose a tag to compare
  • 1.12 - 2020-10-09 - robl
    • Bugfix: Improve matching on MY_IP for ICMP/sponge.

Release 1.11

23 Sep 17:48
Compare
Choose a tag to compare
  • 1.11 - 2020-08-24 - robl
    • Roll in some LONAP changes.
    • Better method of managing disk space / automatically remove samples if disk space
      becomes low.
    • Changes to ifconfig / use ip cmd instead.
    • Changes to syslog / loghost
    • Other small changes and bugfixes.
    • Update html examples.
    • Add IXP-Manager sponge automation tool and example cgi (see files in ./IXP-Manager)

Release 1.10

08 Mar 12:14
Compare
Choose a tag to compare

Initial release of 1.10 on github.

  • Some minor doc/licence changes.
  • Update default ARP_WARNLEVEL in documentation and script
  • 1.10 - 2018-01-24 - robl

    • Roll in some LONAP changes.
    • Update TSHARK commands for updated version of tshark (requires explicit "-Y" filter flag for reading samples.)
    • Newer versions of tshark require "-f" capture filter.
    • Add IPv6 Traffic type summary to report.
    • Fixed IPv4 ICMP output (tshark output format changed)
    • Minor changes to docs and comments.
    • Included html files from LONAP as example for RRD stats page.