Skip to main content

Troubleshoot missed Schedule Actions

When a Schedule does not start a Workflow Execution at its expected time, the Action was either skipped intentionally (paused, overlap policy, end time reached) or the Temporal Service could not take the Action within the Catchup Window. This guide covers the second case.

Alert on missed catchup window

The Temporal Service emits a counter each time it skips a scheduled Action because it could not run it within the configured Catchup Window. Alert on any non-zero value.

Temporal Cloud

Alert on temporal_cloud_v1_schedule_missed_catchup_window_count grouped by temporal_namespace.

Example PromQL:

sum by (temporal_namespace) (
increase(temporal_cloud_v1_schedule_missed_catchup_window_count[5m])
) > 0

Self-hosted

Alert on schedule_missed_catchup_window grouped by namespace.

Example PromQL:

sum by (namespace) (
increase(schedule_missed_catchup_window[5m])
) > 0

The metric is scoped to the Namespace, not to individual Schedules. A non-zero value tells you that at least one Schedule in the Namespace missed an Action, but not which one.

Investigate which Schedule missed an Action

Once the alert fires, narrow down to the affected Schedule in two steps.

1. List Schedules in the Namespace

Enumerate the Schedules in the alerting Namespace:

temporal schedule list --namespace <your-namespace>

ListSchedules returns Schedule Ids and summary information. It does not return per-Schedule miss counters, so use it only to produce the set of Schedule Ids to inspect.

2. Describe each Schedule

For each Schedule Id returned, run:

temporal schedule describe \
--schedule-id <your-schedule-id> \
--namespace <your-namespace>

DescribeSchedule returns full Schedule state, including the info block with cumulative counters. The relevant fields:

FieldMeaning
missedCatchupWindowActions skipped because they could not run within the Catchup Window. Non-zero here identifies the Schedule responsible for the alert.
overlapSkippedActions skipped because the previous run was still in progress and the Overlap Policy is Skip.
bufferDroppedBuffered Actions dropped because the buffer was full under BufferOne or BufferAll.
bufferSizeCurrent depth of the Action buffer.
recentActionsMost recent Action times and results.
runningWorkflowsWorkflow Executions currently running for this Schedule.

Scripting the fan-out against the JSON output (temporal schedule describe -o json) is usually faster than inspecting each Schedule interactively.

Interpret the result

Once you have identified the Schedule with a non-zero missedCatchupWindow, use the rest of the DescribeSchedule output to determine impact and root cause.

Assess impact

  • Compare recentActions to the Schedule's Spec to determine how many Actions were skipped and over what time period.
  • If the Schedule uses the Skip Overlap Policy and the preceding run was long-running, the miss may reflect that run exceeding the Catchup Window, not a Service outage.
  • For business-critical Schedules, Backfill the skipped interval once the underlying cause is resolved.

Common root causes

  • Service or Namespace outage longer than the Catchup Window. The default Catchup Window is one year, so a miss typically means the Schedule is configured with a tighter window (minimum ten seconds) and the outage exceeded it.
  • Namespace rate limiting. If scheduled starts are throttled, Actions can queue past the Catchup Window. Cross-check temporal_cloud_v1_schedule_rate_limited_count (Cloud) or schedule_rate_limited (self-hosted) in the same time range.
  • Buffer overruns under BufferAll. Long-running Workflow Executions under BufferAll can push buffered Actions past the Catchup Window. Cross-check temporal_cloud_v1_schedule_buffer_overruns_count (Cloud) or schedule_buffer_overruns (self-hosted) and examine bufferSize.

Remediate

  • Widen the Catchup Window if the current value is tighter than your Service's worst-case unavailability. The trade-off is that more late Actions will fire during recovery.
  • Revisit the Overlap Policy if runs routinely exceed the Spec interval. BufferAll and Skip have different failure modes under sustained delay.
  • Increase Namespace throughput limits if rate limiting is the contributing factor.
  • Backfill the missed interval if the skipped Actions need to run.