Why Discord is the surface

Most ops dashboards are surfaces nobody opens. The CFO doesn't log into Datadog. The CEO doesn't open Sentry. Half the engineering team has Slack alert fatigue and another quarter has muted the channel. Status pages get checked once during an incident and never again. Discord is open during the workday for most engineering, ops, and revenue teams — and it has the right primitives for a bus: persistent channels, threaded conversations, slash-commands, embed-rich messages, role-based mentions.

The bus moves notifications, alerts, and lead handoffs to where the team already is — and adds slash-commands so the team can act from the same surface without context-switching to four other tools. The unit of work is "the channel where the event lives," not "the dashboard you have to open."

Sentinel runs as a bot in your existing Discord workspace (or a fresh one we provision in week 1 of onboarding). It listens for webhooks, fans them to lane-aware channels, and exposes a slash-command suite to the team. The infrastructure is yours — the bot runs on your Cloudflare account, the queues live in your R2, the secrets sit in your wrangler vault. Garnet has read access for diagnostics; cancellation removes our access without breaking the bus.

Why not Slack?

Slack works — Sentinel can target it. The reason Discord is the default: Slack's webhook rate limits are tighter, its slash-command UX is less rich (no select menus on free tier, no full-screen modals), and its file-upload + bot-permission model treats third-party bots as second-class. Discord's infra is more permissive without being chaotic, and its hierarchical role mentions ("operator-on-call", "ops-team", "leadership-only") map cleanly to escalation logic. Pro tier includes one cross-platform migration; Scale and Enterprise can run the same bus across multiple platforms.

Daily — event ingestion + integration health

A Cloudflare Worker per integration ingests webhooks from configured sources:

Stripe — payment events, subscription lifecycle, dispute notifications
Calendly — bookings, cancellations, no-show flags
GitHub — PR open/merge, CI failures, deployment status
Linear / Jira — ticket assignments, blockers, escalations
Postal / Resend — bounce + complaint feedback for outbound mail
Cloudflare — Workers + Pages deploy events, error spikes
Datadog / Grafana / Sentry — alert routing with severity grouping

Each integration carries a heartbeat probe; if 3 consecutive heartbeats miss, an integration health degraded alert fires to the ops channel. Heartbeat state lives in R2, queryable via the /sentinel-status slash command.

Weekly — routing tuning + slash-command analytics

Once a week, the engineer reviews routing metrics:

Lead → operator latency — median time from inbound webhook to operator acknowledging in-channel. Target: <60s for hot leads, <5min for warm.
Slash-command usage histogram — which commands the team actually uses, which sit unused. Unused commands get pruned; heavily-used ones get response-shape tuning (1-screen vs. multi-screen embed).
Alert noise audit — alert sources that fire >10×/day get rate- limited or filtered; signatures that always co-fire get clustered into a single embed.

Tuning ships as Worker code commits — the bot doesn't get config-edited, it gets redeployed. Engineering posture, not knob-twisting.

Monthly — executive PDF

On the 1st of each month, a Workflow renders an executive PDF:

Event volume — total webhooks routed, breakdown per integration
Integration health — per-integration uptime % + downtime windows + notes
Routing metrics — rules fired, leads routed, median lead-to-operator latency
Top alerts — most-frequent alert signatures + median resolution time
Slash-command usage — top-N commands + invocation count
Uptime — overall bot uptime % + incidents (with cause + fix)

Slash-command suite (Pro)

/lead-status — pull most-recent inbound leads + their routing trail
/route-test — dry-run a lead through current routing rules to debug misclassification
/quiet — mute non-critical alerts for N minutes (e.g., during a maintenance window)
/replay — replay a specific webhook by event ID (idempotent — won't double-fulfill)
/sentinel-status — bot health + integration heartbeats + last incidents

Scale tier adds /escalate, /customer-lookup, and a customer-facing /garnet command for paying tenants to query subscription state. Enterprise adds a custom command suite scoped to the engagement.

What success looks like

Across the first 90 days, Sentinel Pro typically routes 1,000–10,000 webhooks/month across 5–10 integrations with >99.9% uptime. Median lead-to-operator latency settles in the 30–60 second range. The routing rule set converges within ~6 weeks; after that, it's drift-only changes (new integration on, old one off).

What it isn't

Not a chatbot. We're not building you an LLM-fronted customer support assistant. The bot is an event bus + slash-command surface for the operator team.
Not a CRM. If you need a CRM, use one. Sentinel routes events into the CRM you already have (HubSpot, Salesforce, Pipedrive, Attio).
Not a SaaS dashboard. Discord is the surface; we're not building a web UI you'd never open.
Not vendor-locked. If you outgrow Discord and want Slack or Microsoft Teams, the same routing core re-targets — Pro tier includes one cross- platform migration; Scale tier includes ongoing multi-platform.
Not a Zapier replacement. Zapier is great for "trigger A → call B," but its observability is shallow and its rate limits + cost scale badly past a few thousand events/month. Sentinel is purpose-built for the operator-bus pattern: durable queue, full audit trail in R2, idempotent replays, slash-command introspection.

Day 1, Day 30, Day 90

Day 1 — onboarding kickoff

30-min intake: integration list, channel layout, escalation roles, on-call schedule
Bot provisioned in your Discord workspace (or fresh workspace if needed)
Cloudflare Worker per integration deployed in your account; R2 audit log bucket created; secrets vaulted in your wrangler config
Day-1 integration set bootstrapped: Stripe + GitHub + 1 of (Datadog | Sentry | Grafana)
First slash-command suite (/sentinel-status, /replay, /quiet) live

Day 30 — full integration set + first executive PDF

5–10 integrations live with heartbeat probes
Routing rules tuned against the first month's traffic — false-positive alerts down significantly, lead-to-operator latency under target
First monthly executive PDF lands

Day 90 — bus stabilized

Lead-to-operator latency settles in the 30–60 second range across all inbound channels
Routing rule set converges (week-over-week diff goes from "many edits" to "drift only")
Slash-command usage histogram identifies the team's actual workflow — unused commands pruned, heavy-used commands tuned for response shape
Quarter-end review: tier escalation decision, custom command requests for next cycle

FAQ

What if our team is already on Slack?

Sentinel works on Slack. The reason Discord is the default surface is documented above (rate limits, slash-command UX, role hierarchy) but Slack-targeted Sentinel is supported from day one. The migration cost is part of Pro tier.

What if we don't have webhooks for some of our tools?

For tools without webhooks (older systems, internal-only services, on-prem databases), Sentinel can poll on a schedule via Cloudflare Workers Cron Triggers. Polling is more expensive in compute units than webhooks but vastly cheaper than running a stateful middleware. Polling integrations land at the same surface as webhook integrations.

How do you handle PII in the audit log?

The R2 audit log stores webhook payloads verbatim by default — meaning if Stripe sends a customer's email in the event payload, it ends up in the log. For PII-sensitive deployments (Scale and Enterprise) we run a configurable redaction pass on ingest: regex-based for emails/phones, structured-field-based for customer addresses, full-tokenization with reversible vault for everything else. Configurable per-integration.

What happens if Discord goes down?

Discord uptime is high but not infinite. When the platform is degraded, Sentinel buffers events in R2 and replays them when Discord recovers (idempotent — a webhook never double-fires). For Enterprise customers we provision a backup channel surface (Slack, Teams, or email) that fires only during prolonged outages.

Can the team add their own slash commands?

Pro tier includes the standard suite. Scale tier includes 3 custom commands per quarter (typical custom commands: /refund, /cancel-trial, /comp-customer, /escalate-to-eng). Enterprise tier is unlimited custom commands within the cycle's audit hours.

Can Sentinel handle high-volume traffic?

The bus is built on Cloudflare Workers and R2 — both scale near-linearly. Pro tier carries a soft cap at 10K events/month; Scale tier 100K/month; Enterprise tier scales with the engagement. Above 100K/month is uncommon but supported with durable queues + batched fan-out.

What if we want to bring our own bot?

You can. Sentinel's Worker layer can target any bot endpoint — the bot itself is a thin renderer, the heavy lifting (routing, queueing, audit, slash-command logic) sits in Workers. If you've already invested in a custom bot, we wrap it instead of replacing it.

Adjacent lanes

Sentinel-aaS is one of four production lanes. It's the connective tissue between the others:

GEO Methodology — citation-drift alerts on the GEO panel are routed via Sentinel. The 15-minute drift alert pipeline lives here.
Audit Retainer — schema-drift alerts, weekly drift reports, and engineering-ticket notifications all flow through the Sentinel bus. The /audit-status slash-command is a Sentinel deliverable.
Cluster Ops — node-down alerts, eviction notices, thermal throttle warnings, and the cluster monthly PDF preview all land in the Sentinel bus.

Most customers running multiple lanes consolidate alerts into a single Sentinel deployment — there's a discount for cross-lane bundles. Talk to engineering for the math.

See Sentinel-aaS pricing → See the 30-day onboarding walkthrough → or talk to engineering

How Discord-resident operations work.