Monitoring Systems

Synopsis

There are three monitoring services with distinct roles:

Java JMX:JConsole of VisualVM can be used to monitor Java processes that have JMX instrumentation enabled.
Statsd:Available at statsd.dataone.org (129.237.201.114) is a metrics collector and rendering solution able to provide close to real time reporting of arbitrary measurements.
Check-mk:Available at https://monitor.dataone.org/check_mk/ provides general system monitoring and problem notification.

Java JMX

In a nutshell, we open a SOCKS proxy to the host using SSH. This is necessary because JMX will respond on a random port even though the initial connection is on a defined port.

In this example 7778 is the port for the SOCKS proxy, and port 8010 is the jmx instrumentation port.

Start the application with JMX instrumentation enabled:

JAVA_OPTS="-Djava.awt.headless=true -Xmx8192m -XX:+UseParallelGC -Xms1024M -XX:MaxPermSize=512M \
 -Dcom.sun.management.jmxremote \
 -Dcom.sun.management.jmxremote.port=8010 \
 -Dcom.sun.management.jmxremote.authenticate=false \
 -Dcom.sun.management.jmxremote.ssl=false \
 -Djava.rmi.server.hostname=127.0.0.1 \
 -Dhazelcast.jmx=true"

Open a socks proxy with an SSH tunnel to the machine running the java service:

ssh -D7778 cn-dev-ucsb-1.test.dataone.org

Connect JConsole to the service:

jconsole -J-DsocksProxyHost=localhost \
  -J-DsocksProxyPort=7778 \
  service:jmx:rmi:///jndi/rmi://127.0.0.1:8010/jmxrmi

Or using VisualVm:

  1. In VisualVM Preferences | Network, set to use a localhost SOCKS proxy and remove “127.0.0.1” from No Proxy Hosts

  2. Add a new Local JMX connection with the JMX parameter:

    service:jmx:rmi:///jndi/rmi://127.0.0.1:8010/jmxrmi
    

Statsd

StatsD is a statistics collection service able to record arbitrary metrics. It can be likened to set of instruments to which measurements are sent. The graphite service tracks and renders the measurements values for each instrument.

Measurements are sent to the statsd service over UDP in plain text with individual measurements terminated by a new line character. A measurement is composed of at least three components, with an optional fourth:

name:value|modifier[|@sample]
name:The name of the instrument to which the measurement. If an instrument name does not exist then a new one is created. Instruments can be grouped by using periods to separate parent.child.grandchild. In general it’s best to use only letters, numbers, underscore, dash, and period for the name. Do not use colon (:), comma (,), pipe (|) or at (@).
value:The value of the measurement reading. Except for gauges, this will be an integer value.
modifier:Indicates which type of measurement is being collected. Can be one of “g” (gauge), “c” (counter), “ms” (timer), and “s” (sets).
sample:If present, indicates that the measurement is being sampled at some proportion of the time. For example, a sample of “@0.1” indicates that the measurement is being sampled 1/10 of the time.

Example Measurements:

hz.create.rate:10.1|g : Current reading of “rate” within the group “hz” and sub-group “create” is 10.1 (of whatever units).

Using bash

Measurements can be sent to statsd using bash, in this case the value 29.5 is being sent to the instrument test in the mymetrics group:

$ echo "mymetrics.test:29.5|g" | nc -w 1 -u 129.237.201.114 8125

Using Java

A java client, java-statsd-client is available from Maven:

<dependency>
  <groupId>com.timgroup</groupId>
  <artifactId>java-statsd-client</artifactId>
  <version>1.0.1</version>
</dependency>

To use the client:

import com.timgroup.statsd.StatsDClient;

class SomeClass {

  private StatsDClient stats = null;

  public SomeClass() {
    String prefix = "mymetrics";
    String host = "129.237.201.114";
    int port = 8125;
    stats = new StatsDClient(prefix, host, port);
  }

  public someOp() {
    Integer measurement = CalculateSomeValue();
    stats.gauge("test", measurement);
  }
}

The measurement will then appear in graphite under the name “mymetrics/test”

Note that this Java client seems to only support integer values for gauge measurements.

Using Python

There are several statsd python clients available.

One that works well is python-statsd-client. Documentation for the client is available on that site.

Check_mk

Check_mk provides a layer of functionality over Nagios that simplifies configuration and monitoring of remote machines. The check_mk installation is located at:

and uses the central LDAP for authentication.

Adding a Server to Check_mk

To monitor a new server with check_mk, it is necessary to install check-mk- agent, enable it as a service using xinetd, and ensure that fire walls are set to allow requests from the check_mk server (monitor.dataone.org, 129.237.201.155). By default, the check-mk-service listens on TCP port 6556.

For Ubuntu servers, install the check-mk-agent:

sudo apt-get update
sudo apt-get install xinetd check-mk-agent

Edit the xinetd configuration:

service check_mk
{
    type           = UNLISTED
    port           = 6556
    socket_type    = stream
    protocol       = tcp
    wait           = no
    user           = root
    server         = /usr/bin/check_mk_agent

    # If you use fully redundant monitoring and poll the client
    # from more then one monitoring servers in parallel you might
    # want to use the agent cache wrapper:
    #server         = /usr/bin/check_mk_caching_agent

    # configure the IP address(es) of your Nagios server here:
    #only_from      = 127.0.0.1 10.0.20.1 10.0.20.2
    only_from    = 127.0.0.1 129.237.201.155

    # Don't be too verbose. Don't log every check. This might be
    # commented out for debugging. If this option is commented out
    # the default options will be used for this service.
    log_on_success =

    disable        = no
}

Then restart xinetd and poke a hole through the firewall:

sudo service xinetd restart
sudo ufw allow from 129.237.201.155 to any port 6556

You can check this is running by connecting with telnet from an address listed in the only_from configuration parameter:

telnet MY_HOST 6556

The response should be immediate and verbose.

Add the server to the monitored set of servers by logging in https://monitor.dataone.org/check_mk then under WATO | Hosts add a new host to the appropriate group. Check the services, save the configuration, and the status should appear in the monitored servers.