Building ScamIntelli: Fraud Detection with Graph ML

Fraud detection is one of those domains where the cost of a false negative is measured in stolen money and broken trust, and the cost of a false positive is a frustrated customer and lost revenue. Getting the balance right requires more than a good model — it requires a system that can reason about context, adapt to new patterns, and explain its decisions.

ScamIntelli is my attempt to build that system.

The Core Problem

Traditional fraud detection relies on rule-based systems: if a transaction exceeds a threshold, or comes from a flagged IP, flag it. These systems are fast and auditable but they age poorly. Fraudsters learn the rules and route around them. Within weeks of deploying a new ruleset, adversarial actors have found the gaps.

Machine learning helps — but naive classifiers still treat each transaction in isolation. The real signal in fraud is relational. A single transaction from a new account looks suspicious. But when you can see that this account shares a device fingerprint with 40 other accounts that all transacted with the same merchant in a 10-minute window, you have evidence of an organized ring, not just one bad actor.

Graph ML captures this.

The Architecture

ScamIntelli is built around three core layers:

Ingestion layer: A FastAPI service receives transaction events via a REST endpoint. Events are validated with Pydantic, enriched with metadata (device, IP reputation, velocity features), and pushed onto a Redis stream.

Graph construction: A worker process consumes from the Redis stream and builds a property graph in real time. Nodes represent entities — accounts, devices, merchants, IPs. Edges represent interactions — transactions, logins, shared attributes. The graph lives in a PostgreSQL database using a custom adjacency schema, with a Redis cache for hot subgraphs.

Detection engine: For each new transaction, the engine extracts a local subgraph (k-hop neighborhood), computes structural features (degree centrality, clustering coefficient, shared neighbor counts), and feeds them into an ensemble of models — a gradient boosting classifier for tabular features and a Graph Neural Network for structural embeddings. Decisions are merged with a calibrated confidence score.

What Graph ML Actually Buys You

The GNN learns to recognize fraud topologies — patterns like star graphs (one bad actor transacting with many victims), cliques (coordinated ring behavior), and bridges (money mule accounts connecting otherwise unconnected clusters). These patterns don’t show up in per-transaction feature vectors.

The results on my test dataset: the graph-augmented ensemble catches ~23% more fraud rings than the tabular-only baseline, at the same false positive rate. That gap widens as the fraud patterns become more coordinated.

The Hard Parts

Latency: Building and querying a graph per transaction is expensive. The solution: precompute and cache ego-graphs for high-velocity accounts. For new accounts, fall back to a fast tabular-only model and queue the graph computation asynchronously.

Concept drift: Fraud patterns shift. The detection engine logs all predictions with confidence scores and flags low-confidence decisions for human review. Periodically, reviewed samples feed back into retraining.

Explainability: “The GNN said so” is not an acceptable answer for a compliance team. I built a simple attribution layer that traces which graph features drove the decision — which neighbor accounts, which shared attributes — and renders them as a human-readable case summary.

Current Status

ScamIntelli is under active development. The ingestion and graph construction layers are production-ready. The GNN training pipeline is operational. The explainability module is in progress.

The code is on GitHub. If you’re working in fraud detection or anomaly detection on graphs, I’d love to compare notes.