Anomalous System Call Detection via Static Analysis

Table of Contents

The idea
Building the dataset
The detection model
Honest limitations
Why it mattered

The idea

Signature-based host intrusion detection has a structural weakness: it can only catch what someone has already seen, named, and written a rule for. Anomaly detection inverts the problem. Instead of describing attacks, describe normal — and treat meaningful deviation from it as a signal.

For a UNIX process, one of the most faithful descriptions of "normal" is its system-call behaviour. Whatever a program does — open files, talk to the network, spawn children — it ultimately does through a narrow, observable interface to the kernel. This is the insight behind Forrest et al.'s classic A Sense of Self for Unix Processes, and it was the foundation for my final-year project at the Secure Systems Lab at King's College London, supervised by Prof. Lorenzo Cavallaro.

Building the dataset

The first half of the project was data plumbing. Using strace and ptrace, I captured the system-call sequences of target programs under normal operation, then wrote a parsing pipeline in Python and C to turn raw traces into structured training data — sequences and frequency distributions suitable for modelling. Vagrant kept the capture environment reproducible: an Ubuntu i386 target that could be reset to a clean state between runs.

Two practical lessons from this phase stick with me:

Tracing is invasive. ptrace changes timing and can change behaviour, so the "normal" you record is never quite the "normal" that runs untraced. You design around it by capturing many runs under varied conditions.
Coverage is everything. A model of normal behaviour built from a narrow set of program inputs will scream anomaly at the first legitimate code path it has never seen. Most false positives are really dataset gaps.

The detection model

With the dataset in place, the detector applied a probabilistic model over system-call patterns: learn the likelihood of observed call sequences during training, then score live traces against that baseline. Sequences that fell far enough outside the learned distribution were flagged as anomalous.

The evaluation target was a deliberately planted stack-based buffer overflow in a program running on the Ubuntu i386 VM. Exploiting it changes the program's control flow — and changed control flow shows up as system-call patterns the model had never assigned meaningful probability to. The detector caught it: the exploited run scored as anomalous against the trained baseline, with no signature, no IOC, and no prior knowledge of the vulnerability.

Honest limitations

A research prototype that catches one planted exploit is a proof of concept, not a product, and the literature is clear about where this approach struggles. Wagner et al.'s work on mimicry attacks showed that an attacker who knows the model can craft an exploit whose system-call sequence stays inside the learned distribution. Sequence-based detection raises the bar; it does not make exploitation impossible.

The other cost is operational: per-program baselines are expensive to build and maintain, and they drift as software updates change legitimate behaviour. Any real deployment needs a retraining story, not just a detection one.

Why it mattered

This project is where detection stopped being abstract for me. It forced the full pipeline — instrumentation, data engineering, modelling, evaluation, and an honest accounting of failure modes — on a system small enough to understand completely. Everything I have done in detection engineering since follows the same shape; only the telemetry and the scale have changed.