Ossec, jboss, log4j and syslog are on a plane…

As long as java is concerned you will always face the trouble of processing multi-line logs. It gets worse if you use ossec[1], a powerful log analyzer with real-time alerting. Ossec ships with default rules, especially rule 1002 “Unknown problem(s) in the system” that matches generic error-related words such as “critical|failure|segfault|error” against your logs.

Because of this rule, you’ll be sent tons of alerts when one of your java server/application prints a stacktrace and/or with multi-line entries that prints such words because you just can’t decode these log entries to know its java and create a decoder for it. Of course you can decode the first line in a multi-line entry, it has its own pattern (date with milliseconds etc), but you just can’t handle other line as they aren’t prefixed.

Sadly, using a FILE log4j appender to create something that resemble Syslog default formating using a custom patternLayout doesn’t work for stacktraces…

I think you are left with only one solution, to use a Syslog log4j appender, and use rsyslog templates capabilities to prefix each log line with something that resemble Syslog default formating and can be handled by ossec.

// Generate: 12 Apr 13:37:42 host.example.com jboss: applicative crap
$template jbossFormat,"%timegenerated% host.example.com jboss: %msg%\n"
local2.*    /var/log/app/server.log;jbossFormat

That way, java (developers) can log anything, you can decode it accuratly:

<decoder name="jboss">
    <program_name>jboss</program_name>
</decoder>

and choose to ignore stacktrace and false positive as follow:

<rule id="100002" level="0">
   <if_sid>1002</if_sid>
   <decoded_as>jboss</decoded_as>
   <description>Ignoring unknown problems for Jboss logs</description>
</rule>

[1] — http://www.ossec.net/

Pacemaker: Moving from IPaddr to IPaddr2

For Linux users only, moving from ocf:heartbeat:IPaddr to ocf:heartbeat:IPaddr2 can be a good idea, especially since IPaddr had bug(s) in it (see my other pacemaker post[1]). I said “Linux users” because ocf:heartbeat:IPaddr2 uses /sbin/ip where IPaddr uses /sbin/ifconfig.

One of the main differences between /sbin/ifconfig and /sbin/ip is that the primer creates and manages aliases such as eth0.3:0 while the later creates and manages addresses only visible ip addr show. Read link[2] for comparison. If /sbin/ifconfig can’t handle /sbin/ip addresses, fortunatly the opposite isn’t true, this will help after the migration, you’ll understand #1

Let’s go ! You can move from IPaddr to IPaddr2 without reloading or restarting your resources, i.e: you can do it live (but don’t blame me please :p)

The process is quite simple:

  • Enter maintenance-mode: your resources aren’t managed anymore but keep their current state ;
  • Edit your resources and replace IPaddr with IPaddr2 ;
  • At this state, Pacemaker replaces your old IPaddr resources with new IPaddr2 resources. It doesn’t create orphans. But, Pacemaker has to know the state of your new resources before commiting, it’s very important. Hence the reprobe ;
  • Once you reprobed, Pacemaker is aware your new IPaddr2 resources are up because the monitor operation found the previous IPaddr aliases up. That’s because /sbin/ip is aware of /sbin/ifconfig aliases #1 ;
  • Finally, after validating that your resources won't move with ptest, you can leave maintenance-mode and commit your changes.
crm
options editor vim
configure property maintenance-mode=true
configure edit
:%s/IPaddr/&2/gc
:wq
resource reprobe
configure ptest scores
configure property maintenance-mode=false
exit

At this point, your cluster now handles IPaddr2 resources but the previous aliases are still used and visible in /sbin/ifconfig output, that's because has I said before #1 /sbin/ip handles aliases fine. Do not down them, they are in use ! These aliases will be removed and recreated with the new syntax as soon as the resource restarts.

[1] http://floriancrouzat.net/2011/09/pacemaker-tips/
[2] http://www.tty1.net/blog/2010-04-21-ifconfig-ip-comparison_en.html

Find -exec, actions, fork and speed

Assuming you understand such a command: find . type -f -exec ls -l {} \; this post will explains certain subtleties about find, fork, and speed. First, remember not to pipe find with the very bad and dangerous xargs, as explained in this must read “find guide”[1].

So, if you can’t use xargs, you’ll use -exec. What I wanted to talk about is the difference(s) between \; and \+ at the end of a find -exec command.

find /tmp/find -type f -exec ls -artl {} \;

In this find expression (using \;), each time find finds a matching filename, find clones and ls -artl is triggered. As a result, the files aren’t sorted by modification time since ls -artl is used one file after another.

 #  find /tmp/find -type f -exec ls -artl {} \;
-rw-r--r--  1  root  root  0     Nov  23  11:49  /tmp/find/new/quux
-rw-r--r--  1  root  root  1024  Nov  16  10:22  /tmp/find/veryold/foo
-rw-r--r--  1  root  root  520   Oct  8   2010   /tmp/find/veryold/bar
-rw-r--r--  1  root  root  961   Nov  22  14:39  /tmp/find/old/baz

As you can see, files are not sorted, and strace shows a PID per clone, that is to say, per found file..

# strace  find /tmp/find -type f -exec ls -artl {} \; &>/dev/stdout | grep clone
clone(child_stack=0, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0xb7705728) = 7947
clone(child_stack=0, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0xb7705728) = 7948
clone(child_stack=0, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0xb7705728) = 7949
clone(child_stack=0, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0xb7705728) = 7950

find /tmp/find -type f -exec ls -artl {} \+

On the other hand, In this expression (using \+), find first computes a list of filenames then passes it to -exec. It means ls -artl is only triggered once, on the whole file list, which is really really faster and actually sort files by modification time even though they are in different directories.

#  find /tmp/find -type f -exec ls -artl {} \+
-rw-r--r-- 1 root root  520 Oct  8  2010 /tmp/find/veryold/bar
-rw-r--r-- 1 root root 1024 Nov 16 10:22 /tmp/find/veryold/foo
-rw-r--r-- 1 root root  961 Nov 22 14:39 /tmp/find/old/baz
-rw-r--r-- 1 root root    0 Nov 23 11:49 /tmp/find/new/quux

As you can see, files are sorted, and strace shows only a single clone PID.

# strace  find /tmp/find -type f -exec ls -artl {} \+ &>/dev/stdout | grep clone
clone(child_stack=0, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0xb7581728) = 7956

To make it short, -exec command {} \+ on 10K files will trigger 10K commands of one argument each and consume 10K PIDs while \; only requires a single PID to run a single command of 10K arguments.

[1] http://mywiki.wooledge.org/UsingFind#Actions_in_bulk:_xargs.2C_-print0_and_-exec_.2B-