observability

Observability

The observability subject provides logging, error tracking, and monitoring infrastructure for docs.github.com. These tools help monitor system health, catch errors, and provide operational visibility through structured logging and alerting.

Purpose & Scope

This subject is responsible for:

Structured logging with logfmt format in production
Logger abstraction over console.log for server-side code
Error handling and resilience (catch and report errors)
Integration with Sentry for error tracking
Integration with StatsD for metrics
Integration with Failbot for alerts
Automatic request logging middleware
Request context tracking via requestUuid

Note: This tracks system health, not user behavior. User behavior tracking is in src/events.

Logging

Please see the logger README for details on using the logger.

Architecture & Key Assets

Key capabilities and their locations

logger/index.ts - createLogger(): Creates logger instance for a module
logger/middleware/get-automatic-request-logger.ts - Express middleware for automatic request logging
middleware/handle-errors.ts - Global Express error handler that logs and reports errors
middleware/catch-middleware-error.ts - Wraps async middleware to catch errors
lib/failbot.ts - Reports errors to Failbot for alerting
lib/statsd.ts - Sends metrics to StatsD for monitoring

Setup & Usage

Using the logger

Instead of console.log, use the logger:

import { createLogger } from '@/observability/logger'

// Pass import.meta.url to include filename in logs
const logger = createLogger(import.meta.url)

// Log levels: error, warn, info, debug
logger.info('Processing request', { userId: '123' })
logger.error('Failed to process', { error })

Log levels (highest to lowest):

error - Errors that need attention
warn - Warnings that may need attention
info - Informational messages
debug - Detailed debugging information

Set LOG_LEVEL environment variable to filter logs:

LOG_LEVEL=info npm run dev  # Filters out debug logs

Benefits of structured logging

Logfmt format in production - Easy to query in Splunk with key-value pairs
Log level grouping - Filter by severity (error, warn, info, debug)
Request context - Every log includes path and requestUuid
Sentry integration - Errors in Sentry include requestUuid to find related logs
Development clarity - Simple string logs in development, structured in production

Automatic request logging

Request logging happens automatically via middleware:

Development: GET /en 200 2ms
Production: Logfmt with full context including requestUuid

All application logs from the same request share the same requestUuid.

Error handling

Wrap async middleware to catch errors:

import catchMiddlewareError from '@/observability/middleware/catch-middleware-error'

router.get('/path', catchMiddlewareError(async (req, res) => {
  // Errors here are caught and handled
  const data = await fetchData()
  res.json(data)
}))

Global error handler in middleware/handle-errors.ts catches all Express errors.

Data & External Dependencies

Data inputs

Application logs from logger.<method>() calls
Request metadata (path, method, status, duration)
Error objects with stack traces
Request context (requestUuid, user agent, etc.)

Dependencies

Splunk - Log aggregation and querying (index: docs-internal)
Sentry - Error tracking and alerting
StatsD - Metrics collection
Failbot - Error reporting and alerting
Logfmt - Log format library

Data outputs

Structured logs sent to Splunk
Errors reported to Sentry with context
Metrics sent to StatsD
Alerts sent via Failbot

Cross-links & Ownership

Related subjects

src/events - User behavior analytics (separate from observability)
src/frame - Middleware pipeline where error handlers run
All subjects - All should use createLogger() instead of console.log

Internal documentation

Splunk dashboard: https://splunk.githubapp.com/en-US/app/gh_reference_app/search
For detailed logging guide, see logger/README.md in this directory
Sentry dashboard: (internal link)
On-call runbooks: (internal Docs Engineering repo)

Ownership

Team: Docs Engineering
Note: We don't own Datadog or the observability infrastructure itself - we're working with what the observability team provides.

Current State & Next Steps

Querying logs in Splunk

All queries should specify index:

index=docs-internal

Find logs by request:

index=docs-internal requestUuid="abc-123"

Find errors:

index=docs-internal level=error

Find logs from specific module:

index=docs-internal module="src/search/middleware/general-search.ts"

Request context

Every log includes:

requestUuid - Unique ID for the request
path - Request path
method - HTTP method
statusCode - Response status
duration - Request duration
module - Source file (from import.meta.url)

Error reporting flow

Error occurs in application code
Caught by catchMiddlewareError or global error handler
Logged with logger.error() including stack trace
Reported to Sentry with requestUuid
Critical errors trigger Failbot alerts

Adding observability to new code

Import and create logger at top of file:

import { createLogger } from '@/observability/logger'
const logger = createLogger(import.meta.url)

Log important events:

logger.info('Cache hit', { key })
logger.warn('Rate limit approaching', { count })
logger.error('Database connection failed', { error })

Wrap async middleware:

import catchMiddlewareError from '@/observability/middleware/catch-middleware-error'
router.use(catchMiddlewareError(myMiddleware))

Known limitations

Logs are verbose in production (logfmt includes full context)
requestUuid tracking requires middleware initialization
Development logs are simplified strings (less structured)

Planned work

We have an epic to improve our logging

Monitoring and alerting

Active monitoring:

Error rates tracked in Sentry
Performance metrics tracked in StatsD
Critical errors trigger Failbot alerts to #docs-ops
On-call rotation notified for production incidents

For on-call procedures and escalation, see internal Docs Engineering runbooks.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

Observability

Purpose & Scope

Logging

Architecture & Key Assets

Key capabilities and their locations

Setup & Usage

Using the logger

Benefits of structured logging

Automatic request logging

Error handling

Data & External Dependencies

Data inputs

Dependencies

Data outputs

Cross-links & Ownership

Related subjects

Internal documentation

Ownership

Current State & Next Steps

Querying logs in Splunk

Request context

Error reporting flow

Adding observability to new code

Known limitations

Planned work

Monitoring and alerting

Name		Name	Last commit message	Last commit date
parent directory ..
lib		lib
logger		logger
middleware		middleware
tests		tests
README.md		README.md

FilesExpand file tree

observability

Directory actions

More options

Directory actions

More options

Latest commit

History

observability

Folders and files

parent directory

README.md

Observability

Purpose & Scope

Logging

Architecture & Key Assets

Key capabilities and their locations

Setup & Usage

Using the logger

Benefits of structured logging

Automatic request logging

Error handling

Data & External Dependencies

Data inputs

Dependencies

Data outputs

Cross-links & Ownership

Related subjects

Internal documentation

Ownership

Current State & Next Steps

Querying logs in Splunk

Request context

Error reporting flow

Adding observability to new code

Known limitations

Planned work

Monitoring and alerting