NetObserv Flow Queues 90% Full
The NetObserv Flow’s log reports, processor to output writer
or UDP Server to Flow Decoder
are 90% full.
SYMPTOM
NetObserv Flow’s log reports one or both of the following messages:
{"level":"info","ts":"2023-08-07T08:08:14.301Z","logger":"flowcoll","caller":"flowprocessor/metrics.go:118","msg":"flow processor to output writer is 90% full. This is normal when the collector is starting. If it persists for hours, it may indicate that you are at your license threshold or your system is under-resourced."}
{"level":"info","ts":"2023-08-07T08:08:34.264Z","logger":"flowcoll","caller":"server/metrics.go:125","msg":"UDP Server to Flow Decoder is 90% full. This is normal when the collector is starting. If it persists for hours, it may indicate that you are at your license threshold or your system is under-resourced."}
These logs might also be accompanied by throttler
logs:
2023-06-28T21:20:21.821Z warn throttle/restricted_throttle.go:105 [throttler]: start burst
2023-06-28T21:20:41.822Z warn throttle/restricted_throttle.go:111 [throttler]: stop burst
2023-06-28T21:20:41.822Z warn throttle/restricted_throttle.go:117 [throttler]: start recovery
2023-06-28T21:50:42.142Z warn throttle/restricted_throttle.go:123 [throttler]: stop recovery
PROBLEM
It is typical for these messages to occur when the collector first starts, as various internal processes may not yet be fully initialized. However, if the messages persist after the first few minutes, one of the following issues may exist:
- ONLY
flow processor to output writer
- This indicates that the system which data is being output lacks sufficient performance to ingest records at the rate being sent by the collector. This may be due to insufficient CPU, memory, disk space, or excessive disk latency. Insufficient network bandwidth between the collector and target system might also cause the problem. (also see the NOTE below) - BOTH
UDP Server to Flow Decoder
andflow processor to output writer
- This is a further progression of the previous condition. The resulting back pressure from the slow downstream system is now likely causing data to be lost. - ONLY
UDP Server to Flow Decoder
- The internal decoder/processor workers cannot keep up with the rate of records being received. This can be caused by one of the following conditions:- More records are being received than are allowed by the license. If so,
throttler
messages will also appear in the log. - The collector has insufficient resources, primarily CPU cores, to process the rate of records being received.
- The collector has just been started and the caches (for IPs, interfaces, etc.) have yet to be "warmed up" and the related high latency enrichment tasks are limiting throughput.
- More records are being received than are allowed by the license. If so,
6.x
versions prior to 6.3.4
, had an issue with automatically scaling the output pool size for OpenSearch and Splunk based on the Licensed Units. Increasing the output pool size manually, via EF_OUTPUT_OPENSEARCH_POOL_SIZE
or EF_OUTPUT_SPLUNK_HEC_POOL_SIZE
respectively, often solved the issue. Upgrading to 6.3.4
or later also fixes the issue.
SOLUTION
The solution varies depending on the indicated issue, as described in the problem section above.
- ONLY
flow processor to output writer
- Increase the performance of the system to which records are being sent. - BOTH
UDP Server to Flow Decoder
andflow processor to output writer
- Increase the performance of the system to which records are being sent. - ONLY
UDP Server to Flow Decoder
- If
throttler
messages will also appear in the log, contact sales@elastiflow.com to learn about subscription options which will allow you to collector more flow records. - Increase the CPU cores available to the collector.
- If the collector has sufficient CPU resources try increasing the processor pool size by setting EF_PROCESSOR_POOL_SIZE. This allows great concurrency of high latency enrichment tasks.
- If