“Why Is the App Slow?” Troubleshooting Guide

“Why Is the App Slow?” Troubleshooting Guide

Dashboards: kibana-8.14.x-flow-codex.ndjson

The workflow helps you decide whether slowness comes from

  • a DoS / DDoS flood (external or internal)

  • an organic spike in legitimate utilization (application or network)

  • mis-marked or re-marked DSCP values starving the traffic of QoS

All filtering uses dropdown input-list controls; each choice becomes a blue filter pill that persists as you pivot between dashboards.


0 Set the Scene

  1. Analytics → Dashboard in Kibana.

  2. Time-picker ➟ include the slow period (e.g. “Last 60 minutes”).

Dashboard rail (left → right): Overview | Top-N | Core Services | Threats | Flows | Graph | Geo IP | AS Traffic | Exporters | Traffic Details | Flow Records


1 Top-N → Top Applications (Verify the slowdown & scope)

Menu: Top-N ▸ ElastiFlow (flow): Top Applications

#
What to do
Why

1

Exporter / Locality / Application – input list ➟ type & tick the app/port, Apply

Focus every board on the app

2

Throughput / Applications (bits/s)

Sharp rise = flood or organic surge • **No rise but high latency** = check retransmissions (§ 3)

3

Top Clients / Top Servers tables

• One client dominating = internal DoS • Millions of IPs = external DDoS


2 Threats → Threats (TCP / UDP DDoS) (Is it an attack?)

Menu: Threats ▸ ElastiFlow (flow): Threats (TCP) (repeat for UDP if needed)

Panel
What it means

TCP DDoS Events (bar)

Spikes = SYN-, ACK- or RST-flood

Top Attack IPs / Ports

• Many IPs = volumetric DDoS • Few internal IPs = rogue job

Attack Type donut

Confirms vector (SYN, UDP, ICMP …)

If charts light up, escalate to SecOps/NOC with vectors & sources. If empty, continue.


3 Traffic Details → Traffic Details (attributes)

Are sessions retransmitting, stalling, or DSCP-mis-marked? Menu: Traffic Details ▸ ElastiFlow (flow): Traffic Details (attributes) (blue filter pill from § 1 already applied)

Panel / Control
What to do
What it tells you

VLAN / DSCP / TCP Flags – input list

► Tick the expected DSCP value for the app (e.g. EF, AF21) ➟ Apply

Adds a DSCP pill so all charts show only that code-point

DSCP Values (flow records) donut **Throughput / DSCP (bits/s)**

Expected DSCP absent / tiny = app not marking correctly • Large slice, then drops to `CS0` down-stream = **network remarking**

TCP Flags donut

• Excess SYN+RST ⇒ SYN-flood blocked • Excess **ACK+PSH** with low bytes ⇒ app window shrink

Session Established metric

False + surge ⇒ handshake back-pressure (network saturation) • **Yes + high latency** ⇒ server busy / code bottleneck

Tip: Clear the DSCP pill (❌) before moving on if you want to restore full traffic view.


4 Flow Records → Flow Records (src/dst) (Volume & burst analysis)

Menu: Flow Records ▸ ElastiFlow (flow): Flow Records (src/dst)

Check
Why it matters

Flow Records/s (src/dst) line

• Tall, narrow spikes = short-lived DoS • Wide plateau = legitimate surge

Flow Record Count metric

Quantifies burst vs baseline


5 Flows → Flows (src/dst) (Pinpoint offenders & symmetry)

Menu: Flows ▸ ElastiFlow (flow): Flows (src/dst)

#
Action
Meaning

1

(Optional) Src/Dst – input list ➟ server IP or suspect client, Apply

Focus graph

2

Sankey: Flows (src/dst)

• Many thin inbound edges = botnet DDoS • One thick edge = internal flood • Balanced edges + high volume = organic


6 Geo IP → Geo Location (src/dst) (External vs internal)

Menu: Geo IP ▸ ElastiFlow (flow): Geo Location (src/dst)

Map & donuts reveal whether sources are worldwide (DDoS) or a few corporate sites (organic / internal DoS).


7 Exporters → Flow Exporters (traffic) (Collector / exporter health)

Menu: Exporters ▸ ElastiFlow (flow): Flow Exporters (traffic)

Panel
Use it for

Exporter Throughput (bits/s)

Confirm exporter links aren’t saturating

Exporter Packet Drop counters

High drops = visibility loss; real congestion could be higher


8 Decision Matrix

Evidence
Likely Root Cause
Immediate Response

Threats dashboard alerts + many source IPs

External DDoS

Engage ISP / enable scrubbing

Threats empty, one internal IP dominates

Internal DoS / runaway job

Quarantine host, rate-limit

Throughput & Flow-records plateau, balanced edges, Session Established = True

Organic usage spike

Scale app / infra

Session Established = False surge, SYN-only spike

SYN-flood

Enable SYN-cookies / ACL

Expected DSCP absent at source (§ 3)

App not marking QoS

Fix DSCP policy on server / container

DSCP present at source but reset mid-path (§ 3 + 4)

Network remarking

Audit QoS / policy-map on offending hop

Exporter packet drops high

Collector / ingress bottleneck

Load-balance exporters, add collectors


9 90-Second Drill (Quick-look order)

  1. Top Applications – dropdown filter to app; watch throughput & top talkers.

  2. Threats (TCP/UDP) – confirm / rule out attacks fast.

  3. Traffic Details (attributes) – handshake health and DSCP correctness.

  4. Flow Records (src/dst) – burst vs plateau.

  5. Flows (src/dst) & Geo Location – offender patterns.

  6. Exporters (traffic) – ensure telemetry itself isn’t the bottleneck.

With these steps—including DSCP validation—you can tell within minutes whether to page the network QoS engineer, the application owner, or the security team.

Was this helpful?