Privacy Policy

andy.smith
2015-05-31T16:53:19Z
Hi again, a more general performance question for you!

What is the supported usage scenario for SL4NT? I.e. to what level of load does your SL4NT product remain performant (and supported), and what are the bottlenecks I should be aware of?

E.g. I'm measuring the performance as the latency between arrival of the syslog packet and the logging of that packet to disk. I've arbitrarily picked a latency threshold of one second, and a load metric of packets per second.

My guess at this point is that allowing for scaled-up hardware, your product is supportable below 1000 packets per second, before an algorithmic bottleneck surfaces; e.g. locking \ threading

Sound close?

Thanks. -Andy


Other details:

What I'm seeing on a couple systems I inherited is that under 1000 pps load, latency is 0-3 sec, but above 1000 pps, latency becomes minutes, and eventually approaches an hour (when the service crashes). Working set over this period ranges from ~20M on a warm system to ~1.5G under heaviest load.

My systems are 20G \ 2 socket x 8 core x 1.GHz \ fast disks; I don't think I have a hardware bottleneck.
franzk
2015-06-06T20:25:48Z
Hi Andy,

first something about how message processing works:

- A high-priority 'receiver thread' reads the syslog messages from the network by using IO-completion ports and puts the raw messages without further message decoding or processing into the 'received message queue'.
- The 'message processor thread' takes received messages from this queue, decodes the raw message contents and evaluates them against the configured rules. Depending on the outcome of the rule evaluation, messages are put into 'action queues'.
- One thread per action type is then responsible for processing messages in the corresponding 'action queue', e.g. write them to file.

>I don't think I have a hardware bottleneck.

You have and it's the performance of the CPU cores:

In your case the 'message processor thread', which decodes raw messages and evaluate them against the configured rules, is unable to keep up with the rate of new messages being put into the 'received message queue', therefore this queue builds up till all available heap memory is used up.

This means that if you would use, for example, a CPU with 2 GHz cores, instead of one with 1 GHz cores, the rate of messages being processed without queue buildup would double (in reality it would be a little bit less than two times due to locking overhead).

Please note that how many CPU cycles the 'message processor thread' uses for a single message depends mainly on the rule configuration, e.g. using a rule with a regular expression filter condition can take many times more CPU cycles than a rule without such a filter condition.


-Franz



Similar Topics