Sidecar

Health

The service exposes two endpoints accessible via HTTP: GET /livez Returns 200 if the service is alive. Healthy response:

{ "status": "UP" }

GET /readyz Returns 200 if the service is ready. Healthy response:

{ "status": "UP" }

Metrics

The Sidecar exposes a GET /metrics endpoint that returns service metrics in Prometheus format. This endpoint includes both system-level and Sidecar-specific metrics. These metrics are helpful for monitoring the health and performance of your Sidecar service.

The Sidecar exposes standard health and readiness endpoints for monitoring and orchestration. Check health with GET /livez and readiness with GET /readyz, both returning {"status":"UP"} when operational. Metrics are available in Prometheus format at GET /metrics. Key metrics to monitor include:

sidecar_initialization_errors_total - indicates errors during sidecar initialization, such as an invalid API key or misconfiguration. It should always remain at 0. If the number of errors rises above 0 in production, it should immediately trigger an alert to the on-call engineer or roll back to the last working configuration
sidecar_invalid_api_key_errors_total - authentication failures
sidecar_network_request_errors_total - connectivity problems, increments on API errors. It should trigger a warning if any errors (>0) occur within a 5-minute window, and escalate if the error rate remains elevated for 10 minutes
sidecar_redis_client_errors_total - Redis connection issues, should be treated as a signal of connectivity or stability issues with Redis for the service. A reasonable threshold is to trigger a warning if any errors (>0) occur within a 5-minute window, and escalate if the error rate remains elevated for 10 minutes
sidecar_cache_hits_total and sidecar_cache_misses_total - cache performance

sidecar_network_request_errors_total metric is another indication that the API may be unreachable. For Sidecar, entitlement check responses containing isFallback: true means fallback values are used.

A Sidecar liveness probe should not trigger automatic Kubernetes pod restarts for network connectivity issues, Stigg API unreachability, or Edge failures. This is an intentional design choice to prevent the Sidecar from entering a restart loop, which would disrupt the main application whenever the upstream Stigg API is temporarily unreachable. The recommended approach is to keep the sidecar pod alive and rely on cached reads and fail-safe fallback modes while surfacing connectivity or write errors into your own observability and alerting stack, rather than coupling them to pod restarts.

Persistent cache service

When Redis is enabled, the persistent cache service provides its own monitoring endpoints: GET /livez, GET /readyz, and GET /metrics. Important metrics include:

persistent_cache_write_duration_seconds - write performance, high duration can indicate a delay in propagation of changes to Redis. Write duration >5 seconds sustained over a 5-minute window should trigger a warning, and escalate if the duration remains elevated for 15 minutes
persistent_cache_write_errors_total - write failures, should be treated as a signal of connectivity or stability issues with Redis for the service, same as above. Trigger a warning if any errors (>0) occur within a 5-minute window, and escalate if the error rate remains elevated for 10 minutes
persistent_cache_hit_ratio - overall cluster hit rate, a good indicator of Redis utilization, below 0.8 sustained for 15 minutes should trigger a warning, and a drop below 0.6 should be escalated over 60 minutes
persistent_cache_memory_usage_bytes - memory consumption
persistent_cache_hits_total and persistent_cache_misses_total - cache effectiveness
persistent_cache_messages_processed_total - throughput tracking

The hit ratio metric reflects the overall performance of the cluster and is typically a good indication of how well Redis is being utilized.

Recommended metrics and alert thresholds

These thresholds can be adjusted based on your production utilization patterns. Warnings are typically routed to the application team via Slack for investigation, while critical alerts should trigger PagerDuty notifications to the on-call engineer.

Metric	Purpose	Warning (Slack)	Error (Slack)	Critical (PagerDuty)	Remediation
`sidecar_initialization_errors_total` or `sidecar_invalid_api_key_errors_total`	Sidecar initialization failures	`> 0`	`> 0`	`> 0`	Note: often caused by a bad deployment. Roll back if possible. Check service error logs. Validate Stigg API key is correct and active. Review recent configuration changes. Roll back to last stable settings or image version if needed.
`sidecar_network_request_errors_total`	Stigg API unreachable	`> 0` over 5 min	Stays elevated > 10 min	Stays elevated > 15 min	Check error logs. Verify API reachability via status page. Notify the Stigg team if the issue persists.
`sidecar_redis_client_errors_total`	Redis unreachable from Sidecar	`> 0` over 5 min	Stays elevated > 10 min	Stays elevated > 10 min	Check Redis health, memory usage, and instance reachability.
`persistent_cache_write_errors_total`	Redis write failures	`> 0` over 5 min	Stays elevated > 10 min	Stays elevated > 10 min	Check Redis health, memory usage, and instance reachability.
`persistent_cache_write_duration_seconds`	Redis write latency	`> 5 seconds` over 5 min	Stays elevated > 15 min	Stays elevated > 30 min	Check Redis health and reachability. Monitor CPU/memory. If consistently high, consider scaling out.
`persistent_cache_hit_ratio`	Redis cache efficiency	`< 80%` over 15 min	`< 60%` sustained > 60 min	Not applicable	Check Redis health and reachability. Note: low ratio after restarts or cache clearing is expected and should recover as the cache repopulates.

For auto-scaling, monitor the service’s CPU and memory metrics:

Metric	Purpose	Trigger
process_cpu_seconds_total	CPU usage over time	> 60% avg over 5m
process_resident_memory_bytes	Memory used by the process	> 80% avg over 5m

Wrap all SDKs/API calls with try/catch blocks and log the errors.

Getting started

Modeling your pricing in Stigg

Managing customers, subscriptions and entitlements

Getting usage data into Stigg

Snap-in widgets

Native integrations

Importing and exporting data

Going live

Workflow automation

Rolling out pricing changes with confidence

Price localization

Managing your account

High-availability and scale

Security & compliance

Service monitoring

Health

Metrics

Sidecar

Persistent cache service

Recommended metrics and alert thresholds

Getting started

Modeling your pricing in Stigg

Managing customers, subscriptions and entitlements

Getting usage data into Stigg

Snap-in widgets

Native integrations

Importing and exporting data

Going live

Workflow automation

Rolling out pricing changes with confidence

Price localization

Managing your account

High-availability and scale

Security & compliance

​Health

​Metrics

​Sidecar

​Persistent cache service

​Recommended metrics and alert thresholds

Health

Metrics

Sidecar

Persistent cache service

Recommended metrics and alert thresholds