The observability subject provides logging, error tracking, and monitoring infrastructure for docs.github.com. These tools help monitor system health, catch errors, and provide operational visibility through structured logging and alerting.
This subject is responsible for:
- Structured logging with logfmt format in production
- Logger abstraction over
console.logfor server-side code - Error handling and resilience (catch and report errors)
- Integration with Sentry for error tracking
- Integration with StatsD for metrics
- Integration with Failbot for alerts
- Automatic request logging middleware
- Request context tracking via
requestUuid
Note: This tracks system health, not user behavior. User behavior tracking is in src/events.
Please see the logger README for details on using the logger.
logger/index.ts-createLogger(): Creates logger instance for a modulelogger/middleware/get-automatic-request-logger.ts- Express middleware for automatic request loggingmiddleware/handle-errors.ts- Global Express error handler that logs and reports errorsmiddleware/catch-middleware-error.ts- Wraps async middleware to catch errorslib/failbot.ts- Reports errors to Failbot for alertinglib/statsd.ts- Sends metrics to StatsD for monitoring
Instead of console.log, use the logger:
import { createLogger } from '@/observability/logger'
// Pass import.meta.url to include filename in logs
const logger = createLogger(import.meta.url)
// Log levels: error, warn, info, debug
logger.info('Processing request', { userId: '123' })
logger.error('Failed to process', { error })Log levels (highest to lowest):
error- Errors that need attentionwarn- Warnings that may need attentioninfo- Informational messagesdebug- Detailed debugging information
Set LOG_LEVEL environment variable to filter logs:
LOG_LEVEL=info npm run dev # Filters out debug logs- Logfmt format in production - Easy to query in Splunk with key-value pairs
- Log level grouping - Filter by severity (
error,warn,info,debug) - Request context - Every log includes
pathandrequestUuid - Sentry integration - Errors in Sentry include
requestUuidto find related logs - Development clarity - Simple string logs in development, structured in production
Request logging happens automatically via middleware:
- Development:
GET /en 200 2ms - Production: Logfmt with full context including
requestUuid
All application logs from the same request share the same requestUuid.
Wrap async middleware to catch errors:
import catchMiddlewareError from '@/observability/middleware/catch-middleware-error'
router.get('/path', catchMiddlewareError(async (req, res) => {
// Errors here are caught and handled
const data = await fetchData()
res.json(data)
}))Global error handler in middleware/handle-errors.ts catches all Express errors.
- Application logs from
logger.<method>()calls - Request metadata (path, method, status, duration)
- Error objects with stack traces
- Request context (
requestUuid, user agent, etc.)
- Splunk - Log aggregation and querying (index:
docs-internal) - Sentry - Error tracking and alerting
- StatsD - Metrics collection
- Failbot - Error reporting and alerting
- Logfmt - Log format library
- Structured logs sent to Splunk
- Errors reported to Sentry with context
- Metrics sent to StatsD
- Alerts sent via Failbot
src/events- User behavior analytics (separate from observability)src/frame- Middleware pipeline where error handlers run- All subjects - All should use
createLogger()instead ofconsole.log
- Splunk dashboard: https://splunk.githubapp.com/en-US/app/gh_reference_app/search
- For detailed logging guide, see
logger/README.mdin this directory - Sentry dashboard: (internal link)
- On-call runbooks: (internal Docs Engineering repo)
- Team: Docs Engineering
- Note: We don't own Datadog or the observability infrastructure itself - we're working with what the observability team provides.
All queries should specify index:
index=docs-internal
Find logs by request:
index=docs-internal requestUuid="abc-123"
Find errors:
index=docs-internal level=error
Find logs from specific module:
index=docs-internal module="src/search/middleware/general-search.ts"
Every log includes:
requestUuid- Unique ID for the requestpath- Request pathmethod- HTTP methodstatusCode- Response statusduration- Request durationmodule- Source file (fromimport.meta.url)
- Error occurs in application code
- Caught by
catchMiddlewareErroror global error handler - Logged with
logger.error()including stack trace - Reported to Sentry with
requestUuid - Critical errors trigger Failbot alerts
-
Import and create logger at top of file:
import { createLogger } from '@/observability/logger' const logger = createLogger(import.meta.url)
-
Log important events:
logger.info('Cache hit', { key }) logger.warn('Rate limit approaching', { count }) logger.error('Database connection failed', { error })
-
Wrap async middleware:
import catchMiddlewareError from '@/observability/middleware/catch-middleware-error' router.use(catchMiddlewareError(myMiddleware))
- Logs are verbose in production (logfmt includes full context)
requestUuidtracking requires middleware initialization- Development logs are simplified strings (less structured)
- We have an epic to improve our logging
Active monitoring:
- Error rates tracked in Sentry
- Performance metrics tracked in StatsD
- Critical errors trigger Failbot alerts to #docs-ops
- On-call rotation notified for production incidents
For on-call procedures and escalation, see internal Docs Engineering runbooks.