Blogposts tagged icingaf.zz.dehttps://f.zz.de/tags/icinga/f.zz.deikiwiki2020-07-06T11:31:37ZPushover and icinga1https://f.zz.de/posts/202007061324.pushover_and_icinga1/Florian Lohoff2020-07-06T11:31:37Z2020-07-06T11:24:06Z
<p>Yes i know - icinga1 is legacy and old but still there are installations
running and we want modern notifications. I had a look at <a href="https://pushover.net">pushover.net</a>
which looked awesome. So i created an account and integrated it into
icinga1.</p>
<p>This is my <code>/usr/local/sbin/pushover</code></p>
<pre><code>#!/bin/sh
TOKEN=$1
USER=$2
TITLE=$3
T=`tempfile`
trap "rm ${T}" exit
cat >>${T}
curl -s \
--form-string "token=${TOKEN}" \
--form-string "user=${USER}" \
--form-string "title=${TITLE}" \
-F "message=<${T}" \
https://api.pushover.net/1/messages.json \
logger -t pushover
</code></pre>
<p>It accepts the message on stdin, uses the user and api token and title
from the command line. So i define the app and user token in
the contacts definition:</p>
<pre><code>define contact{
contact_name flo-pushover
service_notification_period 24x7
host_notification_period 24x7
service_notification_options w,u,c,r
host_notification_options d,r
service_notification_commands notify-service-by-pushover
host_notification_commands notify-host-by-pushover
_pushovertoken dDiFlqCnOrandomgenerated2lgWlNo9
_pushoveruser jdf3jf6OVrandomgeneratedP1ywAP4z
}
</code></pre>
<p>Here is the service command definition:</p>
<pre><code>define command{
command_name notify-service-by-pushover
command_line /usr/bin/printf <message> | /usr/local/sbin/pushover $_CONTACTPUSHOVERTOKEN$ $_CONTACTPUSHOVERUSER$ "<title>"
}
</code></pre>
Whats wrong with Icinga2https://f.zz.de/posts/201809251054.whats_wront_with_icinga2/Florian Lohoff2018-09-25T09:10:02Z2018-09-25T08:54:06Z
<p>I always had my problems with Icinga2. I always collided with the pretty strict
correlation between services and hosts (For me monitoring and especially dependencys
are a graph) and i always got the limits of dependency modelation. Yes i know,
dependencies are not the common use in monitoring. 95% of Users might get away
without a single explicit dependency.</p>
<p>Some migrations from Icinga1 simply failed because of missing mod_perl/embperl
support which made a simple migration on the same hardware impossible. I made
some hacks abusing the apache on localhost and its mod_perl support for running
icinga2 checks - but that are hacks.</p>
<p>Today i think i found the issue with Icinga2 - It began with an EOL/EOS announcement
for Icinga1 which <a href="https://twitter.com/fl0h0ff/status/1044495121365053440">i replied to on Twitter</a> which i replied to that Icinga2
is not a drop in and there are technical issues which make some installations of
Icinga1 to last possibly forever. I dont buy the argument that its impossible
to embed perl, lua, python, php into Icinga2. Postgres does it, Apache, Nginx. All of
them allow on one way or the other to run code in embedded languages. Yes - These
interpreters may leak memory, they may not be as easy to embed as lua, but its
doable and other projects do it for years. Pointing to this buys me an polite
"Fuck off" - I guess i now know whats wrong here.</p>
<blockquote><p>Michael Friedrich <a href="https://twitter.com/dnsmichi">@dnsmichi</a>
14 Std.vor 14 Stunden</p>
<p>JFYI - #Icinga 1.x is EOL and support ends in 3 months. This includes Classic
UI.</p>
<p>Plan your migration to Icinga 2 & Icinga Web 2.</p>
<p>Florian Lohoff <a href="https://twitter.com/fl0h0ff">@fl0h0ff</a>
Antwort an @dnsmichi</p>
<p>I guess some instances are there to stay for another 10 years. I know some of
them. And i can feel their pain. Icinga2 is not a drop in replacement and i
know at least 3 instances who cant simple change because of performance issues
caused by modperl missing.</p>
<p>Michael Friedrich <a href="https://twitter.com/dnsmichi">@dnsmichi</a>
2 Std.vor 2 Stunden
Antwort an @fl0h0ff</p>
<p>Embedded Perl is a technical No-Go, similar to embedded Python. The
implementation in Icinga 1.x for Perl has plenty of bugs and memory leaks - it
may have worked, but there's room for changes.</p>
<p>Icinga 2 is here for four years now, with a helping community on migration
tasks.</p>
<p>Florian Lohoff <a href="https://twitter.com/fl0h0ff">@fl0h0ff</a>
Antwort an @dnsmichi</p>
<p>Interestingly the webservers have solved these issues for years ... Either by
separation and limiting the number of executions or factoring out into fpm
helpers. And no ... The community does not help porting thousands of lines of
code over from perl.</p>
<p>Michael Friedrich <a href="https://twitter.com/dnsmichi">@dnsmichi</a>
19 Min.vor 19 Minuten
Antwort an @fl0h0ff</p>
<p>Sarcasm won‘t help much, goodbye.</p></blockquote>
US Date vs ISO Datehttps://f.zz.de/posts/201608171930.us_date_vs_iso_date/Florian Lohoff2016-08-17T17:32:57Z2016-08-17T17:30:24Z
<p>How can a piece of software be so broken. There is an international
standard for time and date which helps sorting filesnames. Its
called <a href="https://de.wikipedia.org/wiki/ISO_8601">ISO 8601</a>. Why
on earth does ANYONE think that <strong>month-day-year</strong> is a great
thing to do:</p>
<pre><code>/* get the archived filename to use */
asprintf(&log_archive, "%s%sicinga-%02d-%02d-%d-%02d.log", log_archive_path,
(log_archive_path[strlen(log_archive_path)-1] == '/') ? "" : "/",
t->tm_mon + 1, t->tm_mday, t->tm_year + 1900, t->tm_hour);
</code></pre>
<p>And then there is strftime to format time ...</p>
Interface monitoringhttps://f.zz.de/posts/201602241945.interface_monitoring/Florian Lohoff2016-02-24T18:52:38Z2016-02-24T18:45:33Z
<p>Der erste Schritt zur Alarm suppression - Wir monitoren auf von Kunden
gemanageten Devices nicht den Link Status. 20 Minuten c++ basteln
und einem <a href="https://svn.boost.org/trac/boost/ticket/8009">boost::program_options bug</a> später:</p>
<pre><code>./checkif
Need address, community and ifname
Allowed options:
-h [ --help ] produce help message
--address arg host address
--community arg host snmp v2 community
--ifname arg interface name to monitor
--cachedir arg cache directory for state files
--nolinkstatus ifOperStatus down is not critical
</code></pre>
<p>Das Attribute <code>peer-device</code> auf den interfaces in der FiDB
übersetzen in eine icinga2 variable nolinkstatus:</p>
<pre><code># If we have an unmanaged peer device - dont monitor ifOperStatus
if ($neighbour->attrmatch('peer-device', 'unmanaged', 0)) {
$service->variableadd({ name => "nolinkstatus", value => 1 });
}
</code></pre>
<p>Und entsprechend eine conditional variable im icinga2 für den check:</p>
<pre><code>object CheckCommand "customif" {
import "plugin-check-command"
command = [
"/etc/icinga2/customchecks/checkif/checkif"
]
arguments = {
"--address" = "$address$"
"--community" = "$snmprocommunity$"
"--ifname" = "$ifname$"
"--cachedir" = "/var/cache/nagios3/checkif/"
"--nolinkstatus" = {
set_if = "$nolinkstatus$"
}
}
}
</code></pre>
<p>Und schon werden die entsprechenden ports nicht mehr CRITICAL wenn der
Kunde meint zu booten oder den port runterzufahren. Es muss nur
einmal richtig Dokumentiert werden.</p>
Netzwerkmanagement - Progresshttps://f.zz.de/posts/201602171522.netzwerkmanagement_-_progress/Florian Lohoff2016-02-17T14:25:35Z2016-02-17T14:22:28Z
<p>Wir haben da noch ein paar 10GigE Arrista Switche die bisher wenig Liebe bekommen haben.
Das hat sich heute geändert. Mal ein bischen Tacacs+, NTP, Config backup geschrieben
und ein wenig mehr überwachung. Dazu noch überwachung für die Anzahl der Session Flows
in den SRX Firewall Clustern. Tagwerk vollbracht - Der rest ist Kür ...</p>
3k Serviceshttps://f.zz.de/posts/201602101958.3k_services/Florian Lohoff2016-02-10T19:00:26Z2016-02-10T18:58:30Z
<p>Export der config dauert mittlerweile doch ein paar Minuten, läuft
aber automatisch alle Stunden vollautomatisch. Inkrementelle
updates der icinga2 config auf der Agenda ...</p>
<p><a href="https://f.zz.de/media/201602101958.3k_services.icinga-20160210.png"><img src="https://f.zz.de/posts/201602101958.3k_services/640x-201602101958.3k_services.icinga-20160210.png" width="639" height="424" class="img" /></a></p>
4 Wochen - 2k Serviceshttps://f.zz.de/posts/201602051339.4_wochen_-_2k_services/Florian Lohoff2016-02-05T12:41:29Z2016-02-05T12:39:28Z
<p>Nach 4 Wochen und viel Design der Datenmodelle und schreiben
und neuschreiben von Checks wächst jetzt der engmaschig überwachte
Teil des Netzes. Passend zum Freitag die 2k Checks marke durchbrochen.</p>
<p><a href="https://f.zz.de/media/201602051339.4_wochen_-_2k_services.icinga2-2000services.png"><img src="https://f.zz.de/posts/201602051339.4_wochen_-_2k_services/640x-201602051339.4_wochen_-_2k_services.icinga2-2000services.png" width="640" height="469" class="img" /></a></p>
SRX und IPSEC/IKEhttps://f.zz.de/posts/201602031738.srx_und_ipsec_ike/Florian Lohoff2016-02-03T16:39:28Z2016-02-03T16:38:40Z
<p>Ein neuer Check wurde geboren:</p>
<pre><code>root@icinga2# check_srx_ipsec_tunnel --address 172.34.55.23 --community SIiVOPT19e5L --peeraddr 172.44.61.77
IPSECTUN OK - Peer address 172.44.61.77 IKE Tun State up(1) IPSec SA State active(1)
</code></pre>
Watt zu dBm - Nanhttps://f.zz.de/posts/201602022213.watt_zu_dbm_-_nan/Florian Lohoff2016-02-02T21:17:14Z2016-02-02T21:13:49Z
<p>Hmm - warum die entitySensoren auf den ASR9k die Werte der SFPs in Milliwatt
ausgeben und auf den Switchen wie den 2960 oder 4500X direkt
in dBm wird mir ein Rätsel bleiben.</p>
<p>dBm nochmal in dBm versuchen zu konvertieren geht natürlich schief:</p>
<pre><code>IF OK - in 14.08 MBit/s 687 pkt/s out 8.88 MBit/s 376 pkt/s
Optic Tx: nandBm Rx: nandBm Temp: 41.6°C
</code></pre>
<p>entSensorType auswerten und schon geht auch das gut ...</p>
<pre><code>IF OK - in 108.39 MBit/s 2315 pkt/s out 44.76 MBit/s 1075 pkt/s
Optic Tx:-2dBm Rx:-2.2dBm Temp: 41.6°C
</code></pre>
<p>Wieder noch zum schluss das Tagwerk vollbracht.</p>
ifHCInMulticastPkts & Cohttps://f.zz.de/posts/201602011159.ifhcinmulticastpkts__co/Florian Lohoff2016-02-01T11:04:08Z2016-02-01T10:59:36Z
<p>Es ist erstaunlich wieviele Interfaces zwar HC Counter haben d.h.
<code>ifHCInOctets</code> und Co - aber die entsprechenden Counter
für Broad und Multicast traffic nicht:</p>
<pre><code>Feb 1 11:18:55 nagios checkif[6238]: 172.55.63.1 TenGigE0/0/1/2.343 Invalid value: ifHCInMulticastPkts.20567 =
Feb 1 11:18:55 nagios checkif[6238]: 172.55.63.1 TenGigE0/0/1/2.343 Invalid value: ifHCInBroadcastPkts.20567 =
Feb 1 11:18:55 nagios checkif[6238]: 172.55.63.1 TenGigE0/0/1/2.343 Invalid value: ifHCOutMulticastPkts.20567 =
Feb 1 11:18:55 nagios checkif[6238]: 172.55.63.1 TenGigE0/0/1/2.343 Invalid value: ifHCOutBroadcastPkts.20567 =
</code></pre>
<p>Ich habe die immer wie selbstverständlich mit gepollt
wenn entsprechenden HC Counter supported waren. Das war wohl ein
trugschluss. Auch verstehe ich nicht warum gerade auf den modernen
Plattformen wie ASR9k/IOS-XR <code>ifInUnknownProtos</code> scheinbar
unsupported ist - Selbst im <code>show interface</code> wird das angezeigt.</p>
c++ rewrite der icinga checkshttps://f.zz.de/posts/201601291408.c__rewrite_der_icinga_checks/Florian Lohoff2016-01-29T13:17:36Z2016-01-29T13:08:36Z
<p>Der c++ rewrite des <code>checkif</code> ist fast vollständig. Das icinga2 läuft jetzt mit einem c++ checkif dem nur
noch die Entity MIB query für die Optiken fehlt. Hier mal ein Performancevergleich:</p>
<pre><code>flo@p3:~/projects/snmp-plus-plus/test$ time ./checkif.pl --host 127.0.0.1 --community public -i eth0
IF OK - IPv6 UP
real 0m0.155s
user 0m0.140s
sys 0m0.016s
flo@p3:~/projects/snmp-plus-plus/test$ time ./checkif --address 127.0.0.1 --community public --ifname eth0 --cachedir /tmp
IF OK in 2.57MBit/s 241 pkt/s out 86.98KBit/s 139 pkt/s
real 0m0.007s
user 0m0.004s
sys 0m0.000s
</code></pre>
<p>Wir sind also in der real time - also wirkliche Laufzeit bei einem Faktor 22 und bei den verbrauchten
CPU Cycles im Userspace bei einem Faktor 35.</p>