SL4NT crashing with memory error

rbrundige
Newbie Topic Starter

2014-01-16T23:15:59Z

Hello,

We're trying to get SL4NT to work on a new server but it crashes with an error that says "An error occurred in allocating a memory block of 235 bytes from the heap."

The version of SL4NT is 3.2 SP1. The server's OS version is Windows Server 2008 R2 Service Pack 1.

We've used SL4NT for years on the old server with Windows Server 2003 SP2 with no problems.

Any help you could provide would be appreciated.

Thanks,

Bob

franzk
Administration

2014-01-17T12:05:56Z

#2

Hi Bob,

>We're trying to get SL4NT to work on a new server but it crashes with an error that says "An error occurred in allocating a memory block of 235 bytes from the heap."

this message means that the SL4NT service ran out of available memory:

When does the crash occur? Immediately after service startup or after running for some time? I suspect that for some reasons (failure/timeout writing the received syslog messages to the action targets, like file, DB, etc.) the service queues the received messages in RAM till all RAM is used up.

- Please monitor the RAM resource usage of the SL4NT service after a new service startup: Is it continously increasing till the next crash?

- Are received syslog messages written to all the targets (file, DBs) specified in the configured actions?

- Are there any other relevant event log entries indicating a problem with a specific action target?

- Franz

trajohn
Newbie

2014-01-17T16:58:23Z

#3

When does the crash occur? Immediately after service startup or after running for some time? I suspect that for some reasons (failure/timeout writing the received syslog messages to the action targets, like file, DB, etc.) the service queues the received messages in RAM till all RAM is used up.

The crash occurs generally with 5-15 minutes after the services starts. Syslog messages are being received and actioned on (log file write test) by SL4NT during the time it is alive.

- Please monitor the RAM resource usage of the SL4NT service after a new service startup: Is it continously increasing till the next crash?

This server has 64GB of RAM, SL4NT is not really touching the RAM usage. Less than 1GB of RAM is being consumed by SL4NT and there is nothing else running in addition on the server so plenty of RAM (>50GB) is available.

- Are received syslog messages written to all the targets (file, DBs) specified in the configured actions?

Yes.

- Are there any other relevant event log entries indicating a problem with a specific action target?

No.

Thanks for any and all help you can provide!

-travis

franzk
Administration

2014-01-17T23:00:02Z

#4

Hi Travis,

>SL4NT is not really touching the RAM usage. Less than 1GB of RAM is being consumed by SL4NT and there is nothing else running in addition on the server so plenty of RAM (>50GB) is available.

please note that SL4NT is a 32-bit executable and one GB of heap space is all such an application can get!

There can be no doubt that SL4NT queues received messages in RAM till all available RAM (1 GB) is used up.

Let's check out which action queues are responsible (if any):

Create under the registry key HKLM\SYSTEM\CurrentControlSet\Services\SL4NT\Parameters the following registry values of type REG_DWORD:

LogToEventLogMaxQueueSize

LogToFileMaxQueueSize

RunProgramMaxQueueSize

SendAlertMaxQueueSize

SendEMailMaxQueueSize

ForwardSyslogMessageMaxQueueSize

LogToODBCDBMaxQueueSize

ForwardToTCPChannelViewerMaxQueueSize

ExecuteCustomHandlerMaxQueueSize

and set each of them to a value of 10000.

Start the service afterwards.

If now such an action queue fills up, some percent of it is purged and an a warning event log entry is written.

It's also possible that the receiver queue (which can not be limited) grows too large, but in this case the rate of messages received per second is just too high (do you have any performance numbers of incoming network traffic?)

-Franz

trajohn
Newbie

2014-01-17T23:51:48Z

#5

Hi Franz,

First, thank you for your help on this. I set the registry keys you requested and the service lasted ~12 minutes before stopping. Errors in the log file include:

The 8189 oldest entries have been purged from the LogToFile action type queue because the actual queue size exceeded the allowed size of 10000 entries.

and then the below finally when it died:

An error occurred in allocating a memory block of 265 bytes from the heap.

Our normal average rate is near 8000/second with spikes at busy times going to 13000/second. Is this volume too much?

Thanks again,

-travis

franzk
Administration

2014-01-18T17:18:57Z

#6

Hi Travis,

first something about how Messages processing works:

A high-prio 'receiver thread' reads the syslog messages from the network and puts them into the 'received message queue'.

The 'message processor thread' takes received messages from this queue and evaluates them against the configured rules. Depending on the outcome of the rule evaluation, messages are put into 'action queues'. One thread per action type is then responsible for processing messages in the corresponding 'action queue'.

>The 8189 oldest entries have been purged from the LogToFile action type queue because the actual queue size exceeded the allowed size of 10000 entries.

this entry is to be exptected.

The 'action queues' are not the problem, you should either remove the added registry values or set them to at least 100000.

The problem is obviously that the message processor thread responsible for rule evaluation can't keep up with the number of messages in the 'received message queue', which therefore builds up till all available RAM is used up.

>Our normal average rate is near 8000/second with spikes at busy times going to 13000/second. Is this volume too much?

I don't know your configuration but maybe it's possible to reduce the rule count or simplify rule conditions to reduce the time the 'message processor thread' needs for processing a single message.

Otherwise the average message rate must be reduced.

-Franz

trajohn
Newbie

2014-01-18T22:44:43Z

#7

Franz,

Thanks - that is great information to learn how more of the internals work. I will bump the queue to a higher number (or remove it) however I think it will still only be a longer amount of time until we hit the exception.

Our configuration right now is one rule to catch everything with one action to write it to a file - about as simplified as I think we can go. Thanks again for jumping on this and sharing your thoughts.

-travis

franzk
Administration

2014-01-19T18:17:08Z

#8

Hi Travis,

>I will bump the queue to a higher number (or remove it) however I think it will still only be a longer amount of time until we hit the exception.

it won't make any difference because the 'log-to-file action queue' is not the problem.

I guess you'll have to distribute the load by using more than one (virtualized?) computers with SL4NT.

-Franz

Login

Important Information: