Service Monitoring ================== .. contents:: Contents :local: :backlinks: entry Logs ---- The following logs are created by Coordinating Node services:: /var/log/dataone/replicate/cn-replication.log /var/log/dataone/synchronize/cn-synchronization.log TODO: Need a complete list of logs on the CNS Splunk ------ TBD LogStash -------- TBD Monitoring Java Processes with JMX ---------------------------------- **Step 0.** If ``hostname -i`` does not report the public IP address of the system, then edit ``/etc/hosts`` and set the public IP there. For example, on *cn-dev*, ``hostname -i`` reported 127.0.1.1. The *hosts* file was updated with the correct value:: 127.0.0.1 localhost #127.0.1.1 cn-dev.dataone.org cn-dev 128.111.220.50 cn-dev.dataone.org cn-dev ... See: http://docs.oracle.com/javase/1.5.0/docs/guide/management/faq.html#linux1 Watching Hazelcast ~~~~~~~~~~~~~~~~~~ In this example, *d1-processing* is enabled for JMX monitoring. Create the file ``/etc/dataone/process/jmx.passwd`` with contents:: monitorRole {PASSWORD} and the file ``/etc/dataone/process/jmx.access`` with contents:: monitorRole readonly Change owners of these to user *tomcat6* and make them readable only by that user (has to be same user as process that will be launching the JMX service):: sudo chown tomcat6:tomcat6 /etc/dataone/process/jmx.* sudo chmod 600 /etc/dataone/process/jmx.* Shutdown *d1-processing*:: sudo /etc/init.d/d1-processing stop now startup *d1-processing* with the JMX startup flags:: sudo env JAVA_OPTS="-Djava.awt.headless=true -Xmx4096M -Xms1024M \ -Dcom.sun.management.jmxremote \ -Dcom.sun.management.jmxremote.port=8010 \ -Dcom.sun.management.jmxremote.authenticate=true \ -Dcom.sun.management.jmxremote.ssl=false \ -Dcom.sun.management.jmxremote.password.file=/etc/dataone/process/jmx.passwd \ -Dcom.sun.management.jmxremote.access.file=/etc/dataone/process/jmx.access \ -Djava.rmi.server.hostname=128.111.220.50 \ -Dhazelcast.jmx=true" \ /etc/init.d/d1-processing start Temporarily disable the firewall. This is necessary because even though the JMX service will listen on the specified port, the RMI service, which the JMX client will be directed to by the JMX service, will be listening on a random port:: sudo ufw disable Open jconsole on your desktop, and select "Remote process", entering in:: hostname:port and the username "monitorRole" and the password specified in ``/etc/dataone/process/jmx/passwd``. After a couple of seconds the JMX client should be connected and start collecting statistics. Remember to restart the firewall when you're done:: sudo ufw enable Watching Tomcat ~~~~~~~~~~~~~~~ There's an issue with Java security, probably need permission to access the password and access files, but as an interim measure, disable JAVA_SECURITY in ``/etc/init.d/tomcat6``, stop *tomcat6*, and restart with the following parameters to enable JMX monitoring of tomcat:: sudo env JAVA_OPTS="-Djava.awt.headless=true -Xmx2048M -Xms1024M \ -Dcom.sun.management.jmxremote \ -Dcom.sun.management.jmxremote.port=8020 \ -Dcom.sun.management.jmxremote.authenticate=true \ -Dcom.sun.management.jmxremote.ssl=false \ -Dcom.sun.management.jmxremote.password.file=/etc/dataone/process/jmx.passwd \ -Dcom.sun.management.jmxremote.access.file=/etc/dataone/process/jmx.access \ -Djava.rmi.server.hostname=128.111.220.50 \ -Dhazelcast.jmx=true" \ /etc/init.d/tomcat6 start sudo env JAVA_OPTS="-Djava.awt.headless=true -Xmx2048M -Xms1024M \ -Dcom.sun.management.jmxremote \ -Dcom.sun.management.jmxremote.port=8020 \ -Dcom.sun.management.jmxremote.authenticate=false \ -Dcom.sun.management.jmxremote.ssl=false \ -Dcom.sun.management.jmxremote.password.file=/etc/dataone/monitor/jmx.passwd \ -Dcom.sun.management.jmxremote.access.file=/etc/dataone/monitor/jmx.access \ -Djava.rmi.server.hostname=129.24.0.109 \ -Dhazelcast.jmx=true" \ /etc/init.d/tomcat6 start Check jmx tool:: Usage: check_jmx -U url -O object_name -A attribute [-K compound_key] [-I attribute_info] [-J attribute_info_key] -w warn_limit -c crit_limit [-v[vvv]] [-help] , where options are: -help Prints this page -U JMX URL, for example: "service:jmx:rmi:///jndi/rmi://localhost:1616/jmxrmi" -O Object name to be checked, for example, "java.lang:type=Memory" -A Attribute of the object to be checked, for example, "NonHeapMemoryUsage" -K Attribute key for -A attribute compound data, for example, "used" (optional) -I Attribute of the object containing information for text output (optional) -J Attribute key for -I attribute compound data, for example, "used" (optional) -v[vvv] verbatim level controlled as a number of v (optional) -w warning integer value -c critical integer value Note that if warning level > critical, system checks object attribute value to be LESS THAN OR EQUAL warning, critical If warning level < critical, system checks object attribute value to be MORE THAN OR EQUAL warning, critical ./check_jmx -U service:jmx:rmi:///jndi/rmi://localhost:8020/jmxrmi \ -O java.lang:type=Memory -A HeapMemoryUsage -K used -I HeapMemoryUsage \ -J used -vvvv -w 4248302272 -c 5498760192 ./check_jmx -U service:jmx:rmi:///jndi/rmi://localhost:8020/jmxrmi \ -O java.lang:type=Memory -A LoadedClassCount -K used -I HeapMemoryUsage -J used -vvvv -w 4248302272 -c 5498760192 Listing JMX Beans ~~~~~~~~~~~~~~~~~ Get a JMX console tool. The one used in the examples here is ``jmxterm`` available from: http://wiki.cyclopsgroup.org/jmxterm Fire up jmxterm with something like ``java -jar jmxterm.jar``, then connect to the target using the open command:: java -jar jmxterm.jar $> open 127.0.0.1:8020 #Connection to 127.0.0.1:8020 is opened Get a list of domains:: $>domains #following domains are available Catalina JMImplementation Users com.sun.management java.lang java.util.logging solr/ Select a domain, in this case Catalina and see what beans it offers:: $>domain Catalina #domain is set to Catalina $>beans #domain = Catalina: Catalina:J2EEApplication=none,J2EEServer=none,WebModule=//localhost/,j2eeType=Servlet,name=default Catalina:J2EEApplication=none,J2EEServer=none,WebModule=//localhost/,j2eeType=Servlet,name=jsp Catalina:J2EEApplication=none,J2EEServer=none,WebModule=//localhost/,name=jsp,type=JspMonitor Catalina:J2EEApplication=none,J2EEServer=none,WebModule=//localhost/solr,j2eeType=Filter,name=SolrRequestFilter Catalina:J2EEApplication=none,J2EEServer=none,WebModule=//localhost/solr,j2eeType=Servlet,name=Logging Catalina:J2EEApplication=none,J2EEServer=none,WebModule=//localhost/solr,j2eeType=Servlet,name=SolrServer Catalina:J2EEApplication=none,J2EEServer=none,WebModule=//localhost/solr,j2eeType=Servlet,name=SolrUpdate Catalina:J2EEApplication=none,J2EEServer=none,WebModule=//localhost/solr,j2eeType=Servlet,name=default Catalina:J2EEApplication=none,J2EEServer=none,WebModule=//localhost/solr,j2eeType=Servlet,name=jsp Catalina:J2EEApplication=none,J2EEServer=none,WebModule=//localhost/solr,j2eeType=Servlet,name=ping Catalina:J2EEApplication=none,J2EEServer=none,WebModule=//localhost/solr,name=jsp,type=JspMonitor Catalina:J2EEApplication=none,J2EEServer=none,WebModule=//localhost/solr,name=ping,type=JspMonitor Catalina:J2EEApplication=none,J2EEServer=none,j2eeType=WebModule,name=//localhost/ Catalina:J2EEApplication=none,J2EEServer=none,j2eeType=WebModule,name=//localhost/solr Catalina:class=org.apache.catalina.UserDatabase,name="UserDatabase",resourcetype=Global,type=Resource Catalina:host=localhost,name=ErrorReportValve,type=Valve Catalina:host=localhost,name=StandardContextValve,path=/,type=Valve Catalina:host=localhost,name=StandardContextValve,path=/solr,type=Valve Catalina:host=localhost,name=StandardHostValve,type=Valve Catalina:host=localhost,name=solr/home,path=/solr,resourcetype=Context,type=Environment Catalina:host=localhost,path=/,resourcetype=Context,type=NamingResources Catalina:host=localhost,path=/,type=Cache Catalina:host=localhost,path=/,type=Loader Catalina:host=localhost,path=/,type=Manager Catalina:host=localhost,path=/,type=WebappClassLoader Catalina:host=localhost,path=/solr,resourcetype=Context,type=NamingResources Catalina:host=localhost,path=/solr,type=Cache Catalina:host=localhost,path=/solr,type=Loader Catalina:host=localhost,path=/solr,type=Manager Catalina:host=localhost,path=/solr,type=WebappClassLoader Catalina:host=localhost,type=Deployer Catalina:host=localhost,type=Host Catalina:name=StandardEngineValve,type=Valve Catalina:name=common,type=ServerClassLoader Catalina:name=http-8080,type=GlobalRequestProcessor Catalina:name=http-8080,type=ThreadPool Catalina:name=server,type=ServerClassLoader Catalina:name=shared,type=ServerClassLoader Catalina:port=8080,type=Connector Catalina:port=8080,type=Mapper Catalina:port=8080,type=ProtocolHandler Catalina:realmPath=/realm0,type=Realm Catalina:resourcetype=Global,type=NamingResources Catalina:serviceName=Catalina,type=Service Catalina:type=Engine Catalina:type=MBeanFactory Catalina:type=Server Catalina:type=StringCache The Host bean looks interesting:: $>bean Catalina:host=localhost,type=Host #bean is set to Catalina:host=localhost,type=Host $>info #mbean = Catalina:host=localhost,type=Host #class name = org.apache.tomcat.util.modeler.BaseModelMBean # attributes %0 - aliases ([Ljava.lang.String;, rw) %1 - appBase (java.lang.String, rw) %2 - autoDeploy (boolean, rw) %3 - children ([Ljavax.management.ObjectName;, rw) %4 - configClass (java.lang.String, rw) %5 - deployOnStartup (boolean, rw) %6 - deployXML (boolean, rw) %7 - managedResource (java.lang.Object, rw) %8 - modelerType (java.lang.String, r) %9 - name (java.lang.String, rw) %10 - realm (org.apache.catalina.Realm, rw) %11 - unpackWARs (boolean, rw) %12 - valveNames ([Ljava.lang.String;, rw) %13 - valveObjectNames ([Ljavax.management.ObjectName;, rw) %14 - xmlNamespaceAware (boolean, rw) %15 - xmlValidation (boolean, rw) # operations %0 - void addAlias(java.lang.String alias) %1 - void addChild(org.apache.catalina.Container child) %2 - void destroy() %3 - [Ljava.lang.String; findAliases() %4 - void init() %5 - void removeAlias(java.lang.String alias) %6 - void start() %7 - void stop() #there's no notifications Now let's get a couple attribute values:: $>get appBase #mbean = Catalina:host=localhost,type=Host: appBase = webapps; $>get children #mbean = Catalina:host=localhost,type=Host: children = [ Catalina:j2eeType=WebModule,name=//localhost/,J2EEApplication=none,J2EEServer=none, Catalina:j2eeType=WebModule,name=//localhost/solr,J2EEApplication=none,J2EEServer=none ]; Check_mk Monitoring ------------------- Check_mk provides a layer of functionality over Nagios that simplifies configuration and monitoring of remote machines. The check_mk installation is located at: https://monitor.dataone.org/check_mk/ and uses the central LDAP for authentication. Adding a Server to Check_mk ~~~~~~~~~~~~~~~~~~~~~~~~~~~ To monitor a new server with check_mk, it is necessary to install ``check-mk- agent``, enable it as a service using xinetd, and ensure that fire walls are set to allow requests from the check_mk server (monitor.dataone.org, 129.237.201.155). By default, the check-mk-service listens on TCP port 6556. For Ubuntu servers, install the ``check-mk-agent``:: sudo apt-get update sudo apt-get install xinetd check-mk-agent Edit the xinetd configuration:: service check_mk { type = UNLISTED port = 6556 socket_type = stream protocol = tcp wait = no user = root server = /usr/bin/check_mk_agent # If you use fully redundant monitoring and poll the client # from more then one monitoring servers in parallel you might # want to use the agent cache wrapper: #server = /usr/bin/check_mk_caching_agent # configure the IP address(es) of your Nagios server here: #only_from = 127.0.0.1 10.0.20.1 10.0.20.2 only_from = 127.0.0.1 129.237.201.155 # Don't be too verbose. Don't log every check. This might be # commented out for debugging. If this option is commented out # the default options will be used for this service. log_on_success = disable = no } Then restart xinetd and poke a hole through the firewall:: sudo service xinetd restart sudo ufw allow from 129.237.201.155 to any port 6556 You can check this is running by connecting with telnet from an address listed in the ``only_from`` configuration parameter:: telnet MY_HOST 6556 The response should be immediate and verbose. Add the server to the monitored set of servers by logging in https://monitor.dataone.org/check_mk then under WATO | Hosts add a new host to the appropriate group. Check the services, save the configuration, and the status should appear in the monitored servers.