Skip to content

Metrics and tracing

Surefire emits metrics via System.Diagnostics.Metrics and traces via System.Diagnostics.ActivitySource. Wire them into OpenTelemetry using SurefireDiagnostics.MeterName and SurefireDiagnostics.ActivitySourceName.

Install the OpenTelemetry packages:

Terminal window
dotnet add package OpenTelemetry.Extensions.Hosting
dotnet add package OpenTelemetry.Exporter.OpenTelemetryProtocol

Wire up the meter and activity source in your host builder:

using OpenTelemetry.Metrics;
using OpenTelemetry.Trace;
var builder = WebApplication.CreateBuilder(args);
builder.Services.AddOpenTelemetry()
.WithMetrics(metrics =>
{
metrics.AddMeter(SurefireDiagnostics.MeterName);
metrics.AddOtlpExporter();
})
.WithTracing(tracing =>
{
tracing.AddSource(SurefireDiagnostics.ActivitySourceName);
tracing.AddOtlpExporter();
});
builder.Services.AddSurefire();
InstrumentTypeUnitTagsDescription
surefire.runs.claimedCountersurefire.job.nameRuns claimed by workers
surefire.runs.completedCountersurefire.job.nameRuns completed successfully
surefire.runs.failedCountersurefire.job.name, surefire.dead_letter.reasonRuns that reached the Failed terminal state. Reason is one of retries_exhausted, no_handler_registered, shutdown_interrupted, stale_recovery
surefire.runs.canceledCountersurefire.job.nameRuns canceled
surefire.runs.duration.msHistogrammssurefire.job.nameTime from claim to terminal transition
surefire.scheduler.lag.msHistogrammssurefire.job.nameTime between a run’s NotBefore and when it was actually claimed. Growing values mean the cluster is undersized
surefire.store.operation.msHistogrammssurefire.store.operationStore operation duration
surefire.store.operation.failedCountersurefire.store.operationFailed store operations
surefire.store.retriesCountersurefire.serviceTransient store failure retries
surefire.loop.errorsCountersurefire.loopBackground loop tick failures (executor, maintenance, retention, log pump)
surefire.log_entries.droppedCountersurefire.drop.reasonLog entries dropped before store flush

The activity source creates surefire.run.execute spans with these tags:

TagDescription
surefire.run.idThe run ID
surefire.run.jobThe job name
surefire.run.attemptAttempt number
surefire.run.parentParent run ID (if any)
surefire.job.timeouttrue when the attempt was canceled by WithTimeout

Failed runs set the span status to Error with the exception message.