Category: Operations Tags: health, readiness, liveness, health-check, kubernetes, probes, operational, uptime
/health endpoint./health endpoint MUST NOT require authentication./health endpoint MUST return 200 OK when the service is operating normally./health endpoint MUST return 503 Service Unavailable when the service cannot serve traffic./health endpoint MUST NOT include sensitive data (PII, credentials, internal addresses)./health/live (liveness) and /health/ready (readiness) endpoints./health SHOULD be under 500 milliseconds; health checks MUST NOT perform expensive queries.Health endpoints allow infrastructure components (load balancers, container orchestrators, monitoring systems) to determine whether a service instance is safe to receive traffic.
| Component | Uses Health Check For |
|---|---|
| Load balancer | Route traffic only to healthy instances |
| Kubernetes | Restart unhealthy pods (liveness); hold traffic until ready (readiness) |
| API gateway | Remove degraded upstreams from the rotation |
| Monitoring / alerting | Alert on-call when service health degrades |
/health — General HealthA combined health check suitable for load balancers and monitoring tools.
GET /health HTTP/1.1
HTTP/1.1 200 OK
Content-Type: application/json
Cache-Control: no-store
{
"status": "Healthy",
"timestamp": "2024-07-23T10:30:00Z",
"version": "1.4.2",
"checks": [
{ "name": "database", "status": "Healthy" },
{ "name": "cache", "status": "Healthy" }
]
}
/health/live — Liveness (Kubernetes)Answers: “Is this process alive and not deadlocked?”
A failing liveness probe causes Kubernetes to restart the pod. Keep this check lightweight — it should only verify that the process is running and responsive:
It MUST NOT check external dependencies (database, downstream services).
GET /health/live HTTP/1.1
HTTP/1.1 200 OK
Content-Type: application/json
Cache-Control: no-store
{
"status": "Healthy",
"timestamp": "2024-07-23T10:30:00Z"
}
/health/ready — Readiness (Kubernetes)Answers: “Is this instance ready to serve traffic?”
A failing readiness probe causes Kubernetes to remove the pod from the service’s load balancer without restarting it. Use this to signal temporary unavailability:
GET /health/ready HTTP/1.1
HTTP/1.1 200 OK
Content-Type: application/json
Cache-Control: no-store
{
"status": "Healthy",
"timestamp": "2024-07-23T10:30:00Z",
"checks": [
{ "name": "database", "status": "Healthy", "responseTimeMs": 4 },
{ "name": "config-service", "status": "Healthy", "responseTimeMs": 12 }
]
}
| Status | HTTP Code | Meaning |
|---|---|---|
Healthy |
200 OK |
All checks passed; service is operating normally. |
Degraded |
200 OK |
One or more non-critical checks failed; service is operating with reduced capability. Load balancers SHOULD continue routing traffic. |
Unhealthy |
503 Service Unavailable |
One or more critical checks failed; service cannot serve requests. Load balancers MUST stop routing traffic. |
{
"status": "Healthy",
"timestamp": "2024-07-23T10:30:00Z",
"version": "1.4.2",
"checks": [ ... ]
}
| Field | Required | Type | Description |
|---|---|---|---|
status |
Yes | string | Overall health: Healthy, Degraded, or Unhealthy. |
timestamp |
Yes | string (ISO 8601 UTC) | When the health check was performed. |
version |
No | string | Deployed version of the service. |
checks |
No | array | Individual dependency check results. |
| Field | Required | Type | Description |
|---|---|---|---|
name |
Yes | string | Identifier for this check (e.g. "database", "cache", "payment-service"). |
status |
Yes | string | Healthy, Degraded, or Unhealthy. |
responseTimeMs |
No | integer | Time taken for this check, in milliseconds. |
description |
No | string | Human-readable detail about the check result (never include PII or credentials). |
HTTP/1.1 200 OK
Content-Type: application/json
Cache-Control: no-store
{
"status": "Degraded",
"timestamp": "2024-07-23T10:30:00Z",
"version": "1.4.2",
"checks": [
{ "name": "database", "status": "Healthy", "responseTimeMs": 5 },
{ "name": "email-service", "status": "Unhealthy", "description": "Connection timeout after 3 retries" }
]
}
Email service is non-critical (emails are queued); the service continues operating but is degraded.
HTTP/1.1 503 Service Unavailable
Content-Type: application/json
Cache-Control: no-store
Retry-After: 30
{
"status": "Unhealthy",
"timestamp": "2024-07-23T10:30:00Z",
"version": "1.4.2",
"checks": [
{ "name": "database", "status": "Unhealthy", "description": "Cannot connect to primary database" },
{ "name": "cache", "status": "Healthy", "responseTimeMs": 2 }
]
}
| Check | What to Verify |
|---|---|
| Process health | Is the service responding to HTTP? |
| Memory | Is the process within its memory limits? |
| Thread pool | Are worker threads available? |
| Check | What to Verify |
|---|---|
| Database | Can the service execute a simple query? |
| Cache | Is the cache connection available? |
| Required config | Has configuration loaded successfully? |
| Startup tasks | Have migrations / warm-ups completed? |
SELECT 1), not a business query.Degraded rather than Unhealthy unless it completely blocks operation./health → General health (combined liveness + readiness)
/health/live → Liveness only
/health/ready → Readiness only
These endpoints SHOULD be at the root of the service path, not under a versioned prefix (e.g. not /v1/health). Health endpoints are operational infrastructure, not part of the API version contract.
Example Kubernetes deployment configuration:
livenessProbe:
httpGet:
path: /health/live
port: 8080
initialDelaySeconds: 10
periodSeconds: 10
timeoutSeconds: 3
failureThreshold: 3
readinessProbe:
httpGet:
path: /health/ready
port: 8080
initialDelaySeconds: 5
periodSeconds: 5
timeoutSeconds: 3
failureThreshold: 3
Health endpoint responses MUST include Cache-Control: no-store to prevent intermediaries from caching health status. Stale health responses could cause traffic to be routed to an unhealthy instance.
Cache-Control: no-store