Why Discord is the surface
Most ops dashboards are surfaces nobody opens. The CFO doesn't log into Datadog. The CEO doesn't open Sentry. Half the engineering team has Slack alert fatigue and another quarter has muted the channel. Status pages get checked once during an incident and never again. Discord is open during the workday for most engineering, ops, and revenue teams — and it has the right primitives for a bus: persistent channels, threaded conversations, slash-commands, embed-rich messages, role-based mentions.
The bus moves notifications, alerts, and lead handoffs to where the team already is — and adds slash-commands so the team can act from the same surface without context-switching to four other tools. The unit of work is "the channel where the event lives," not "the dashboard you have to open."
Sentinel runs as a bot in your existing Discord workspace (or a fresh one we provision in week 1 of onboarding). It listens for webhooks, fans them to lane-aware channels, and exposes a slash-command suite to the team. The infrastructure is yours — the bot runs on your Cloudflare account, the queues live in your R2, the secrets sit in your wrangler vault. Garnet has read access for diagnostics; cancellation removes our access without breaking the bus.
Why not Slack?
Slack works — Sentinel can target it. The reason Discord is the default: Slack's webhook rate limits are tighter, its slash-command UX is less rich (no select menus on free tier, no full-screen modals), and its file-upload + bot-permission model treats third-party bots as second-class. Discord's infra is more permissive without being chaotic, and its hierarchical role mentions ("operator-on-call", "ops-team", "leadership-only") map cleanly to escalation logic. Pro tier includes one cross-platform migration; Scale and Enterprise can run the same bus across multiple platforms.
Daily — event ingestion + integration health
A Cloudflare Worker per integration ingests webhooks from configured sources:
- Stripe — payment events, subscription lifecycle, dispute notifications
- Calendly — bookings, cancellations, no-show flags
- GitHub — PR open/merge, CI failures, deployment status
- Linear / Jira — ticket assignments, blockers, escalations
- Postal / Resend — bounce + complaint feedback for outbound mail
- Cloudflare — Workers + Pages deploy events, error spikes
- Datadog / Grafana / Sentry — alert routing with severity grouping
Each integration carries a heartbeat probe; if 3 consecutive heartbeats miss, an
integration health degraded alert fires to the ops channel. Heartbeat
state lives in R2, queryable via the /sentinel-status slash command.
Weekly — routing tuning + slash-command analytics
Once a week, the engineer reviews routing metrics:
- Lead → operator latency — median time from inbound webhook to operator acknowledging in-channel. Target: <60s for hot leads, <5min for warm.
- Slash-command usage histogram — which commands the team actually uses, which sit unused. Unused commands get pruned; heavily-used ones get response-shape tuning (1-screen vs. multi-screen embed).
- Alert noise audit — alert sources that fire >10×/day get rate- limited or filtered; signatures that always co-fire get clustered into a single embed.
Tuning ships as Worker code commits — the bot doesn't get config-edited, it gets redeployed. Engineering posture, not knob-twisting.
Monthly — executive PDF
On the 1st of each month, a Workflow renders an executive PDF:
- Event volume — total webhooks routed, breakdown per integration
- Integration health — per-integration uptime % + downtime windows + notes
- Routing metrics — rules fired, leads routed, median lead-to-operator latency
- Top alerts — most-frequent alert signatures + median resolution time
- Slash-command usage — top-N commands + invocation count
- Uptime — overall bot uptime % + incidents (with cause + fix)
Slash-command suite (Pro)
/lead-status— pull most-recent inbound leads + their routing trail/route-test— dry-run a lead through current routing rules to debug misclassification/quiet— mute non-critical alerts for N minutes (e.g., during a maintenance window)/replay— replay a specific webhook by event ID (idempotent — won't double-fulfill)/sentinel-status— bot health + integration heartbeats + last incidents
Scale tier adds /escalate, /customer-lookup, and a
customer-facing /garnet command for paying tenants to query subscription
state. Enterprise adds a custom command suite scoped to the engagement.
What success looks like
Across the first 90 days, Sentinel Pro typically routes 1,000–10,000 webhooks/month across 5–10 integrations with >99.9% uptime. Median lead-to-operator latency settles in the 30–60 second range. The routing rule set converges within ~6 weeks; after that, it's drift-only changes (new integration on, old one off).
What it isn't
- Not a chatbot. We're not building you an LLM-fronted customer support assistant. The bot is an event bus + slash-command surface for the operator team.
- Not a CRM. If you need a CRM, use one. Sentinel routes events into the CRM you already have (HubSpot, Salesforce, Pipedrive, Attio).
- Not a SaaS dashboard. Discord is the surface; we're not building a web UI you'd never open.
- Not vendor-locked. If you outgrow Discord and want Slack or Microsoft Teams, the same routing core re-targets — Pro tier includes one cross- platform migration; Scale tier includes ongoing multi-platform.
- Not a Zapier replacement. Zapier is great for "trigger A → call B," but its observability is shallow and its rate limits + cost scale badly past a few thousand events/month. Sentinel is purpose-built for the operator-bus pattern: durable queue, full audit trail in R2, idempotent replays, slash-command introspection.
Day 1, Day 30, Day 90
Day 1 — onboarding kickoff
- 30-min intake: integration list, channel layout, escalation roles, on-call schedule
- Bot provisioned in your Discord workspace (or fresh workspace if needed)
- Cloudflare Worker per integration deployed in your account; R2 audit log bucket created; secrets vaulted in your wrangler config
- Day-1 integration set bootstrapped: Stripe + GitHub + 1 of (Datadog | Sentry | Grafana)
- First slash-command suite (
/sentinel-status,/replay,/quiet) live
Day 30 — full integration set + first executive PDF
- 5–10 integrations live with heartbeat probes
- Routing rules tuned against the first month's traffic — false-positive alerts down significantly, lead-to-operator latency under target
- First monthly executive PDF lands
Day 90 — bus stabilized
- Lead-to-operator latency settles in the 30–60 second range across all inbound channels
- Routing rule set converges (week-over-week diff goes from "many edits" to "drift only")
- Slash-command usage histogram identifies the team's actual workflow — unused commands pruned, heavy-used commands tuned for response shape
- Quarter-end review: tier escalation decision, custom command requests for next cycle
FAQ
What if our team is already on Slack?
Sentinel works on Slack. The reason Discord is the default surface is documented above (rate limits, slash-command UX, role hierarchy) but Slack-targeted Sentinel is supported from day one. The migration cost is part of Pro tier.
What if we don't have webhooks for some of our tools?
For tools without webhooks (older systems, internal-only services, on-prem databases), Sentinel can poll on a schedule via Cloudflare Workers Cron Triggers. Polling is more expensive in compute units than webhooks but vastly cheaper than running a stateful middleware. Polling integrations land at the same surface as webhook integrations.
How do you handle PII in the audit log?
The R2 audit log stores webhook payloads verbatim by default — meaning if Stripe sends a customer's email in the event payload, it ends up in the log. For PII-sensitive deployments (Scale and Enterprise) we run a configurable redaction pass on ingest: regex-based for emails/phones, structured-field-based for customer addresses, full-tokenization with reversible vault for everything else. Configurable per-integration.
What happens if Discord goes down?
Discord uptime is high but not infinite. When the platform is degraded, Sentinel buffers events in R2 and replays them when Discord recovers (idempotent — a webhook never double-fires). For Enterprise customers we provision a backup channel surface (Slack, Teams, or email) that fires only during prolonged outages.
Can the team add their own slash commands?
Pro tier includes the standard suite. Scale tier includes 3 custom commands per quarter
(typical custom commands: /refund, /cancel-trial,
/comp-customer, /escalate-to-eng). Enterprise tier is unlimited
custom commands within the cycle's audit hours.
Can Sentinel handle high-volume traffic?
The bus is built on Cloudflare Workers and R2 — both scale near-linearly. Pro tier carries a soft cap at 10K events/month; Scale tier 100K/month; Enterprise tier scales with the engagement. Above 100K/month is uncommon but supported with durable queues + batched fan-out.
What if we want to bring our own bot?
You can. Sentinel's Worker layer can target any bot endpoint — the bot itself is a thin renderer, the heavy lifting (routing, queueing, audit, slash-command logic) sits in Workers. If you've already invested in a custom bot, we wrap it instead of replacing it.
Adjacent lanes
Sentinel-aaS is one of four production lanes. It's the connective tissue between the others:
- GEO Methodology — citation-drift alerts on the GEO panel are routed via Sentinel. The 15-minute drift alert pipeline lives here.
- Audit Retainer — schema-drift alerts,
weekly drift reports, and engineering-ticket notifications all flow through the
Sentinel bus. The
/audit-statusslash-command is a Sentinel deliverable. - Cluster Ops — node-down alerts, eviction notices, thermal throttle warnings, and the cluster monthly PDF preview all land in the Sentinel bus.
Most customers running multiple lanes consolidate alerts into a single Sentinel deployment — there's a discount for cross-lane bundles. Talk to engineering for the math.
See Sentinel-aaS pricing → See the 30-day onboarding walkthrough → or talk to engineering