Where AI Actually Helps in the SOC — and Where It Does Not

Table of Contents

Past the hype cycle
Where it genuinely helps
Where it quietly makes things worse
The standard I hold

Past the hype cycle

Every security vendor now sells "AI-powered" something. Having spent the last couple of years building and owning ML-assisted detection in a production cloud environment — and measuring the results — I have a clearer view than I would like of where the marketing ends and the engineering begins.

The honest summary: machine learning has earned a permanent place in my detection stack, and it earned it in narrower territory than the industry suggests.

Where it genuinely helps

Ranking, not deciding. The single highest-value application I have shipped is using models to score and rank alerts rather than to open or close them. High-volume telemetry produces more candidate signals than any team can read; a model that pushes the likely-true positives to the top of the queue changes the economics of the whole SOC. In my own work this is where the measurable wins came from — true-positive accuracy up roughly 30%, false positives down 75% — and crucially, the final call stayed with a human.

Anomaly detection where rules cannot reach. Signatures describe what you have seen before. For behaviours with no stable signature — unusual IAM activity, a service suddenly talking to infrastructure it never touched — unsupervised methods like Isolation Forests surface candidates that no rule would ever have flagged. The output is a lead, not a verdict, and treating it as a lead is what keeps it valuable.

Enrichment and drafting. Language models are good at the work around the decision: summarising a noisy alert cluster, drafting the first version of an incident timeline, translating a technical finding into an executive paragraph. Every one of those drafts gets reviewed — but starting from a draft is faster than starting from a blank page, and during an incident, minutes matter.

Where it quietly makes things worse

As an excuse to skip the fundamentals. A model trained on bad telemetry learns bad telemetry. Teams reach for ML because their log pipelines are inconsistent and their alert quality is poor — which is precisely the condition under which ML will underperform. Fix the pipelines first; the model is the last mile, not the foundation.

Unauditable verdicts. Any system that closes alerts and cannot show its reasoning is a liability with a dashboard. When a regulator, an insurer, or your own post-incident review asks why an alert was dismissed, "the model scored it low" is not an answer. If you cannot explain it, you cannot defend it.

Drift, silently. Cloud environments change weekly; models trained on last quarter's baseline degrade without announcing it. An ML detection without a retraining schedule and a monitoring story is a rule that rewrites itself at random. Treat model performance as something you measure continuously, exactly as you would detection coverage.

The standard I hold

Three questions before any model ships into my detection path:

Can I measure it? Precision and recall against labelled outcomes, not vibes. If the baseline is not measured, the improvement is fiction.
Can I explain it? An analyst should be able to see why the score is high — the contributing features, the comparison baseline — in the alert itself.
Can I turn it off? Every ML component needs a deterministic fallback. When the model misbehaves at 3am, reverting must be one change, not an architecture discussion.

AI in the SOC is neither saviour nor snake oil. It is a power tool: genuinely transformative on the tasks it fits, dangerous when swung at everything. The teams getting value from it are the ones that did the unglamorous data engineering first and kept humans on the decisions that matter.