Workers Overview
What Lives In The Worker Repo
lumie-worker contains independent FastAPI services for background processing and AI-heavy tasks. The backend stays the orchestration layer: it publishes RabbitMQ jobs, calls worker HTTP endpoints, and remains the only service that writes product data.
The worker repo should be read as a set of product-adjacent processors, not as a second backend. A worker may render images, call an LLM, consume a queue, or store technical checkpoint state, but it should not become the owner of academy business aggregates.
The current product-doc surface in this section is:
| Service | Primary interface | Main job | Key integrations |
|---|---|---|---|
| Grading | RabbitMQ consumer and backend HTTP fallback | Grade OMR answer-sheet images | aio-pika, MinIO, Redis, httpx, OpenTelemetry |
| Report | RabbitMQ consumer and sync fallback HTTP route | Build per-student exam report images | aio-pika, httpx, Jinja2, Playwright, OpenTelemetry |
| Analysis | HTTP API | Generate exam commentary and student feedback | FastAPI, Pydantic, openai.AsyncOpenAI, OpenTelemetry |
| Chatbot | HTTP API with SSE | Run the academy assistant with tool use and confirmation | LangGraph, langchain-openai, httpx, Postgres checkpointer, OpenTelemetry |
Source Paths
| Path | Role |
|---|---|
lumie-worker/services/grading/ | Production OMR grading worker |
lumie-worker/services/report/ | Report rendering worker |
lumie-worker/services/analysis/ | LLM-generated commentary and feedback worker |
lumie-worker/services/chatbot/ | LangGraph-powered assistant worker |
lumie-worker/libs/common/mq.py | Shared RabbitMQ message-processing primitive |
lumie-worker/libs/common/callback_mq.py | Shared callback publisher for worker-to-backend callbacks |
lumie-worker/libs/common/http_signing.py | HMAC signing helper for backend /internal/** calls |
lumie-worker/libs/common/observability.py | Shared Prometheus and OpenTelemetry helpers |
lumie-worker/contracts/mq-schemas-v1.yaml | Hand-maintained MQ contract reference for grading and report |
Shared Runtime Pattern
Most worker services follow the same layout under services/<name>/src/:
config.pydefines typedBaseSettingsso missing environment variables fail at startup.wiring.pybuilds adapters and the use case somain.pycan stay thin.schema.pyholds the Pydantic message or request models that define the wire contract.usecase.pyorchestrates the workflow without owning HTTP, RabbitMQ, or environment handling.adapters/contains concrete integrations such as MinIO, Redis, backend HTTP clients, or LLM clients.
Every service uses FastAPI lifespan hooks to create long-lived resources at startup and close them on shutdown. That is where Lumie opens shared httpx.AsyncClient instances, warms Playwright, connects callback publishers, or attaches a compiled LangGraph.
Service Layout Contract
New worker services should follow the same shape unless there is a concrete reason not to:
| File or directory | Expected responsibility |
|---|---|
main.py | FastAPI app, lifespan, route mounting, and thin handlers |
src/config.py | Pydantic settings and fail-fast environment validation |
src/schema.py | Wire-facing Pydantic request, command, and callback models |
src/usecase.py | Application orchestration without framework ownership |
src/wiring.py | Construction of concrete adapters and use cases |
src/adapters/ | HTTP, MQ, storage, LLM, browser, or external-system adapters |
src/domain/ | Pure data shaping, prompts, rendering models, or domain transforms |
src/observability/ | Service-local metric names and tracing wrapper |
tests/ | Use-case and observability tests that run without external services |
Integration Boundaries
The worker repo is intentionally split across two interaction styles:
grading-svcandreport-svcare RabbitMQ consumers. They subscribe to a command queue, process a job, and publish a callback payload back to thelumie.commandsexchange.grading-svcalso keeps a synchronous HTTP grading endpoint for backend paths that still call the worker directly for one uploaded OMR image.analysis-svcandchatbot-svcare HTTP services. They are called directly by another application layer and return a response on the same request path.
Workers do not call each other. Shared backend-facing calls go through httpx adapters that sign every /internal/** request with X-Tenant-Slug, X-Signature, and X-Timestamp.
Backend Ownership Rule
Workers can be CPU-heavy, IO-heavy, or AI-heavy, but backend ownership remains central:
The database edge stays on the backend side. If a worker needs product data, it
gets that data from backend /internal/** APIs or from a command payload. If it
produces product state, it returns a callback or response for backend-owned
handlers to persist.
Reliability And Observability
- Shared MQ jobs use
libs/common/mq.py::universal_process, which standardizes JSON parsing, retries, callback handling, and ack or nack behavior. - Shared callback publishing lives in
libs/common/callback_mq.py, which keeps one robust RabbitMQ connection open for the service lifetime. - Prometheus metrics are mounted at
/metricsfor every documented worker. - OpenTelemetry is enabled per service with
OTEL_ENABLED,OTEL_ENDPOINT, andOTEL_SERVICE_NAME.
grading-svc, report-svc, and analysis-svc use the shared tracing helper in libs/common/observability.py. chatbot-svc uses a separate tracing setup so LangGraph and LangChain spans can be emitted without breaking its chat model client wiring.
Failure Semantics
| Surface | Malformed input | Processing failure | Callback failure |
|---|---|---|---|
| RabbitMQ workers | Reject without requeue so the broker DLQ can capture it | nack(requeue=True) for retryable failures through universal_process | nack(requeue=True) after callback retry exhaustion |
| Sync HTTP routes | FastAPI validation error or explicit HTTPException | HTTP error response | Not applicable |
| LLM HTTP routes | FastAPI validation error | Upstream API errors propagate through handler failure path | Not applicable |
| SSE chatbot route | Request validation before stream | Error event or closed stream depending on failure point | Not applicable |
This means queue consumers should treat callback publishing as part of the unit of work. A job is not complete until backend has received the success or failure payload.
Configuration Pattern
Every service reads environment through Pydantic settings. Defaults are allowed
for local-only values such as localhost RabbitMQ, but production-critical
secrets and base URLs should fail at startup when missing. Examples:
analysis-svcrequiresLLM_API_KEY.report-svcrequiresLUMIE_BACKEND_URLandLUMIE_INTERNAL_HMAC_SECRET.- queue workers default their queue and prefetch values in code, then override them from deployment values.
Choosing The Right Page
- Read Grading for the OMR queue contract, MinIO and Redis usage, and callback lifecycle.
- Read Report for report rendering, backend data fan-out, and the RabbitMQ plus sync fallback surfaces.
- Read Analysis for the two LLM-backed HTTP endpoints and prompt-driven generation flow.
- Read Chatbot for LangGraph state, SSE streaming, human confirmation, and persistence behavior.