Troubleshoot missed Schedule Actions
When a Schedule does not start a Workflow Execution at its expected time, the Action was either skipped intentionally (paused, overlap policy, end time reached) or the Temporal Service could not take the Action within the Catchup Window. This guide covers the second case.
Alert on missed catchup window
The Temporal Service emits a counter each time it skips a scheduled Action because it could not run it within the configured Catchup Window. Alert on any non-zero value.
Temporal Cloud
Alert on temporal_cloud_v1_schedule_missed_catchup_window_count grouped by temporal_namespace.
Example PromQL:
sum by (temporal_namespace) (
increase(temporal_cloud_v1_schedule_missed_catchup_window_count[5m])
) > 0
Self-hosted
Alert on schedule_missed_catchup_window grouped by namespace.
Example PromQL:
sum by (namespace) (
increase(schedule_missed_catchup_window[5m])
) > 0
The metric is scoped to the Namespace, not to individual Schedules. A non-zero value tells you that at least one Schedule in the Namespace missed an Action, but not which one.
Investigate which Schedule missed an Action
Once the alert fires, narrow down to the affected Schedule in two steps.
1. List Schedules in the Namespace
Enumerate the Schedules in the alerting Namespace:
temporal schedule list --namespace <your-namespace>
ListSchedules returns Schedule Ids and summary information. It does not return per-Schedule miss counters, so use it only to produce the set of Schedule Ids to inspect.
2. Describe each Schedule
For each Schedule Id returned, run:
temporal schedule describe \
--schedule-id <your-schedule-id> \
--namespace <your-namespace>
DescribeSchedule returns full Schedule state, including the info block with cumulative counters. The relevant fields:
| Field | Meaning |
|---|---|
missedCatchupWindow | Actions skipped because they could not run within the Catchup Window. Non-zero here identifies the Schedule responsible for the alert. |
overlapSkipped | Actions skipped because the previous run was still in progress and the Overlap Policy is Skip. |
bufferDropped | Buffered Actions dropped because the buffer was full under BufferOne or BufferAll. |
bufferSize | Current depth of the Action buffer. |
recentActions | Most recent Action times and results. |
runningWorkflows | Workflow Executions currently running for this Schedule. |
Scripting the fan-out against the JSON output (temporal schedule describe -o json) is usually faster than inspecting each Schedule interactively.
Interpret the result
Once you have identified the Schedule with a non-zero missedCatchupWindow, use the rest of the DescribeSchedule output to determine impact and root cause.
Assess impact
- Compare
recentActionsto the Schedule's Spec to determine how many Actions were skipped and over what time period. - If the Schedule uses the
SkipOverlap Policy and the preceding run was long-running, the miss may reflect that run exceeding the Catchup Window, not a Service outage. - For business-critical Schedules, Backfill the skipped interval once the underlying cause is resolved.
Common root causes
- Service or Namespace outage longer than the Catchup Window. The default Catchup Window is one year, so a miss typically means the Schedule is configured with a tighter window (minimum ten seconds) and the outage exceeded it.
- Namespace rate limiting. If scheduled starts are throttled, Actions can queue past the Catchup Window. Cross-check
temporal_cloud_v1_schedule_rate_limited_count(Cloud) orschedule_rate_limited(self-hosted) in the same time range. - Buffer overruns under
BufferAll. Long-running Workflow Executions underBufferAllcan push buffered Actions past the Catchup Window. Cross-checktemporal_cloud_v1_schedule_buffer_overruns_count(Cloud) orschedule_buffer_overruns(self-hosted) and examinebufferSize.
Remediate
- Widen the Catchup Window if the current value is tighter than your Service's worst-case unavailability. The trade-off is that more late Actions will fire during recovery.
- Revisit the Overlap Policy if runs routinely exceed the Spec interval.
BufferAllandSkiphave different failure modes under sustained delay. - Increase Namespace throughput limits if rate limiting is the contributing factor.
- Backfill the missed interval if the skipped Actions need to run.