# System Requirements & Sizing

Elasticsearch is the distributed search and analytics engine at the heart of the Elastic Stack. The ElastiFlow Unified Collectors can be configured to store the collected, processed and enriched records in Elasticsearch. Kibana enables you to interactively explore, visualize, and share insights into your data and manage and monitor the stack. Elasticsearch is where the indexing, search, and analysis happens.

Elasticsearch provides real-time search and analytics for all types of data. It efficiently indexes and stores records in a way that supports fast queries. As your data and query volume grows, the distributed nature of Elasticsearch enables your deployment to grow seamlessly along with it.

### Sizing

Elasticsearch can be deployed as a single-mode server or multi-node cluster. The latter provides for horizontal scaling to handle very high ingest rates and longer retention periods. This section describes multiple deployment scenarios, from a single "lab" server to a multi-node cluster.

#### System Resources

Elasticsearch was engineered to run on "commodity" hardware. This is partially due to the Java Virtual Machine (JVM) and its loss of efficiency with heap sizes of 32GB and above. For this reason, the provided architectures scale by adding Elasticsearch nodes to form larger clusters, rather than increasing the resources allocated to each node.

{% hint style="success" %}
Hardware has grown more and more powerful. Servers with 128 cores (256 threads) and 512GB-1TB of memory are now common. While engineered to run on "commodity" hardware, Elasticsearch can still be deployed on such systems. To take full advantage of the available resources, multiple instances of Elasticsearch should be deployed on the server. Such a cluster can provide very good performance and reliability as long as: 1. each Elasticsearch node has its own dedicated disks; and 2. rack-awareness features are used to ensure that primary and replica shards are not stored on the same physical server.
{% endhint %}

To understand the provided architectures, the following should be considered.

**CPU**

The provided CPU core counts refer to actual physical CPU cores. CPUs which provide SMT/Hyperthreading will have a thread count, twice the core count. For example, if the architecture refers to 16 cores, this would be a 16 core/32 thread processor.

{% hint style="info" %}
When deploying in virtualized or cloud environments, a \_vCPU\_ is not the same as a physical core. For example, 2 vCPUs are the equivalent of 1 physical core and 1 SMT thread. In such environments, the number of allocated vCPUs should be \*\*double\*\* the number of indicated cores.
{% endhint %}

**Memory**

The configured JVM Heap Size for Elasticsearch should be approximately 1/3, and no more than 1/2, of the total memory. However, the heap should **never** be set to more than 31GB. Any additional memory will be used by the operating system as page cache. This allows many queries against recent data to be answered without significant disk I/O. For this reason, more memory will usually result in better query performance.

**Storage**

Determining the necessary storage capacity is generally a straight-forward math problem. The indexed size of a flow record is usually 450-550 bytes. A size of 500 bytes is typically used to estimate the required storage capacity. This results in a storage requirement of 43.2GB/day for each 1000 flows/sec. A replica would also require the same capacity.

Elasticsearch will not allocate shards to nodes that have used more than 85% of their storage capacity. This *low watermark* is configurable, but should only be changed in special circumstances. This means that the *effective* storage capacity of a node will only be 85% of the actual capacity. For example, only 6.8 of 8TB should be considered for capacity planning purposes.

### Deployment Architectures

The following example deployment architectures are provided to help with planning for your own needs and environment.

{% hint style="info" %}
The following retention periods assume that the recommended maximum ingest rate is sustained. If the 24-hour average ingest rate is lower, the retention period will be proportionately longer.
{% endhint %}

#### Single "Lab" Server (x-small)

The [Single "Lab" Server (x-small)](/data_platforms/elastic/elasticsearch/cluster_xsmall.md) deployment is for lab environments and testing with a smaller volume of records.

| Sizing Parameter             |          Value |
| ---------------------------- | -------------: |
| Recommended Max. Ingest Rate | 2000 flows/sec |
| Retention at Max. Rate       |        19 days |
| Redundancy                   |             No |

#### Single Server (small)

The [Single Server (small)](/data_platforms/elastic/elasticsearch/cluster_small.md) deployment is suitable for moderate ingest rates where redundancy is not a requirement, and downtime can be tolerated for activities such as upgrades.

| Sizing Parameter             |           Value |
| ---------------------------- | --------------: |
| Recommended Max. Ingest Rate | 16000 flows/sec |
| Retention at Max. Rate       |         10 days |
| Redundancy                   |              No |

#### Basic Cluster (medium)

The [Basic Cluster (medium)](/data_platforms/elastic/elasticsearch/cluster_medium.md) deployment is suitable for moderate ingest rates where redundancy is a requirement. It also allows for minimal to no downtime for most maintenance tasks.

| Sizing Parameter             |           Value |
| ---------------------------- | --------------: |
| Recommended Max. Ingest Rate | 24000 flows/sec |
| Retention at Max. Rate       |         10 days |
| Redundancy                   |             Yes |

#### Advanced Cluster (large)

The [Advanced Cluster (large)](/data_platforms/elastic/elasticsearch/cluster_large.md) deployment is suitable for high ingest rates and is easily expanded as necessary.

| Sizing Parameter             |           Value |
| ---------------------------- | --------------: |
| Recommended Max. Ingest Rate | 48000 flows/sec |
| Retention at Max. Rate       |         10 days |
| Redundancy                   |             Yes |

#### Multi-Tier Cluster (x-large)

The [Multi-Tier Cluster (x-large)](/data_platforms/elastic/elasticsearch/cluster_xlarge.md) deployment is suitable for high ingest rates, while also supporting longer retention periods.

| Sizing Parameter             |           Value |
| ---------------------------- | --------------: |
| Recommended Max. Ingest Rate | 48000 flows/sec |
| Retention at Max. Rate       |         30 days |
| Redundancy                   |             Yes |


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.elastiflow.com/data_platforms/elastic/elasticsearch.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
