Skip to main content

Workers Overview

What Lives In The Worker Repo

lumie-worker contains independent FastAPI services for background processing and AI-heavy tasks. The backend stays the orchestration layer: it publishes RabbitMQ jobs, calls worker HTTP endpoints, and remains the only service that writes product data.

The worker repo should be read as a set of product-adjacent processors, not as a second backend. A worker may render images, call an LLM, consume a queue, or store technical checkpoint state, but it should not become the owner of academy business aggregates.

The current product-doc surface in this section is:

ServicePrimary interfaceMain jobKey integrations
GradingRabbitMQ consumer and backend HTTP fallbackGrade OMR answer-sheet imagesaio-pika, MinIO, Redis, httpx, OpenTelemetry
ReportRabbitMQ consumer and sync fallback HTTP routeBuild per-student exam report imagesaio-pika, httpx, Jinja2, Playwright, OpenTelemetry
AnalysisHTTP APIGenerate exam commentary and student feedbackFastAPI, Pydantic, openai.AsyncOpenAI, OpenTelemetry
ChatbotHTTP API with SSERun the academy assistant with tool use and confirmationLangGraph, langchain-openai, httpx, Postgres checkpointer, OpenTelemetry

Source Paths

PathRole
lumie-worker/services/grading/Production OMR grading worker
lumie-worker/services/report/Report rendering worker
lumie-worker/services/analysis/LLM-generated commentary and feedback worker
lumie-worker/services/chatbot/LangGraph-powered assistant worker
lumie-worker/libs/common/mq.pyShared RabbitMQ message-processing primitive
lumie-worker/libs/common/callback_mq.pyShared callback publisher for worker-to-backend callbacks
lumie-worker/libs/common/http_signing.pyHMAC signing helper for backend /internal/** calls
lumie-worker/libs/common/observability.pyShared Prometheus and OpenTelemetry helpers
lumie-worker/contracts/mq-schemas-v1.yamlHand-maintained MQ contract reference for grading and report

Shared Runtime Pattern

Most worker services follow the same layout under services/<name>/src/:

  • config.py defines typed BaseSettings so missing environment variables fail at startup.
  • wiring.py builds adapters and the use case so main.py can stay thin.
  • schema.py holds the Pydantic message or request models that define the wire contract.
  • usecase.py orchestrates the workflow without owning HTTP, RabbitMQ, or environment handling.
  • adapters/ contains concrete integrations such as MinIO, Redis, backend HTTP clients, or LLM clients.

Every service uses FastAPI lifespan hooks to create long-lived resources at startup and close them on shutdown. That is where Lumie opens shared httpx.AsyncClient instances, warms Playwright, connects callback publishers, or attaches a compiled LangGraph.

Service Layout Contract

New worker services should follow the same shape unless there is a concrete reason not to:

File or directoryExpected responsibility
main.pyFastAPI app, lifespan, route mounting, and thin handlers
src/config.pyPydantic settings and fail-fast environment validation
src/schema.pyWire-facing Pydantic request, command, and callback models
src/usecase.pyApplication orchestration without framework ownership
src/wiring.pyConstruction of concrete adapters and use cases
src/adapters/HTTP, MQ, storage, LLM, browser, or external-system adapters
src/domain/Pure data shaping, prompts, rendering models, or domain transforms
src/observability/Service-local metric names and tracing wrapper
tests/Use-case and observability tests that run without external services

Integration Boundaries

The worker repo is intentionally split across two interaction styles:

  • grading-svc and report-svc are RabbitMQ consumers. They subscribe to a command queue, process a job, and publish a callback payload back to the lumie.commands exchange.
  • grading-svc also keeps a synchronous HTTP grading endpoint for backend paths that still call the worker directly for one uploaded OMR image.
  • analysis-svc and chatbot-svc are HTTP services. They are called directly by another application layer and return a response on the same request path.

Workers do not call each other. Shared backend-facing calls go through httpx adapters that sign every /internal/** request with X-Tenant-Slug, X-Signature, and X-Timestamp.

Backend Ownership Rule

Workers can be CPU-heavy, IO-heavy, or AI-heavy, but backend ownership remains central:

The database edge stays on the backend side. If a worker needs product data, it gets that data from backend /internal/** APIs or from a command payload. If it produces product state, it returns a callback or response for backend-owned handlers to persist.

Reliability And Observability

  • Shared MQ jobs use libs/common/mq.py::universal_process, which standardizes JSON parsing, retries, callback handling, and ack or nack behavior.
  • Shared callback publishing lives in libs/common/callback_mq.py, which keeps one robust RabbitMQ connection open for the service lifetime.
  • Prometheus metrics are mounted at /metrics for every documented worker.
  • OpenTelemetry is enabled per service with OTEL_ENABLED, OTEL_ENDPOINT, and OTEL_SERVICE_NAME.

grading-svc, report-svc, and analysis-svc use the shared tracing helper in libs/common/observability.py. chatbot-svc uses a separate tracing setup so LangGraph and LangChain spans can be emitted without breaking its chat model client wiring.

Failure Semantics

SurfaceMalformed inputProcessing failureCallback failure
RabbitMQ workersReject without requeue so the broker DLQ can capture itnack(requeue=True) for retryable failures through universal_processnack(requeue=True) after callback retry exhaustion
Sync HTTP routesFastAPI validation error or explicit HTTPExceptionHTTP error responseNot applicable
LLM HTTP routesFastAPI validation errorUpstream API errors propagate through handler failure pathNot applicable
SSE chatbot routeRequest validation before streamError event or closed stream depending on failure pointNot applicable

This means queue consumers should treat callback publishing as part of the unit of work. A job is not complete until backend has received the success or failure payload.

Configuration Pattern

Every service reads environment through Pydantic settings. Defaults are allowed for local-only values such as localhost RabbitMQ, but production-critical secrets and base URLs should fail at startup when missing. Examples:

  • analysis-svc requires LLM_API_KEY.
  • report-svc requires LUMIE_BACKEND_URL and LUMIE_INTERNAL_HMAC_SECRET.
  • queue workers default their queue and prefetch values in code, then override them from deployment values.

Choosing The Right Page

  • Read Grading for the OMR queue contract, MinIO and Redis usage, and callback lifecycle.
  • Read Report for report rendering, backend data fan-out, and the RabbitMQ plus sync fallback surfaces.
  • Read Analysis for the two LLM-backed HTTP endpoints and prompt-driven generation flow.
  • Read Chatbot for LangGraph state, SSE streaming, human confirmation, and persistence behavior.