Workers Overview

What Lives In The Worker Repo

lumie-worker contains independent FastAPI services for background processing and AI-heavy tasks. The backend stays the orchestration layer: it publishes RabbitMQ jobs, calls worker HTTP endpoints, and remains the only service that writes product data.

The worker repo should be read as a set of product-adjacent processors, not as a second backend. A worker may render images, call an LLM, consume a queue, or store technical checkpoint state, but it should not become the owner of academy business aggregates.

The current product-doc surface in this section is:

Service	Primary interface	Main job	Key integrations
Grading	RabbitMQ consumer and backend HTTP fallback	Grade OMR answer-sheet images	`aio-pika`, MinIO, Redis, `httpx`, OpenTelemetry
Report	RabbitMQ consumer and sync fallback HTTP route	Build per-student exam report PDFs	`aio-pika`, `httpx`, ReportLab, MinIO, OpenTelemetry
Analysis	HTTP API	Generate exam commentary and student feedback	FastAPI, Pydantic, `openai.AsyncOpenAI`, OpenTelemetry
Chatbot	HTTP API with SSE	Run the academy assistant with tool use and confirmation	LangGraph, `langchain-openai`, `httpx`, Postgres checkpointer, OpenTelemetry

Source Paths

Path	Role
`lumie-worker/services/grading/`	Production OMR grading worker
`lumie-worker/services/report/`	Report rendering worker
`lumie-worker/services/analysis/`	LLM-generated commentary and feedback worker
`lumie-worker/services/chatbot/`	LangGraph-powered assistant worker
`lumie-worker/libs/common/mq.py`	Shared RabbitMQ message-processing primitive
`lumie-worker/libs/common/callback_mq.py`	Shared callback publisher for worker-to-backend callbacks
`lumie-worker/libs/common/http_signing.py`	HMAC signing helper for backend `/internal/**` calls
`lumie-worker/libs/common/observability.py`	Shared Prometheus and OpenTelemetry helpers
`lumie-worker/contracts/mq-schemas-v1.yaml`	Hand-maintained MQ contract reference for grading and report

Shared Runtime Pattern

Most worker services follow the same layout under services/<name>/src/:

config.py defines typed BaseSettings so missing environment variables fail at startup.
wiring.py builds adapters and the use case so main.py can stay thin.
schema.py holds the Pydantic message or request models that define the wire contract.
usecase.py orchestrates the workflow without owning HTTP, RabbitMQ, or environment handling.
adapters/ contains concrete integrations such as MinIO, Redis, backend HTTP clients, or LLM clients.

Every service uses FastAPI lifespan hooks to create long-lived resources at startup and close them on shutdown. That is where Lumie opens shared httpx.AsyncClient instances, warms renderers, connects callback publishers, or attaches a compiled LangGraph.

Service Layout Contract

New worker services should follow the same shape unless there is a concrete reason not to:

File or directory	Expected responsibility
`main.py`	FastAPI app, lifespan, route mounting, and thin handlers
`src/config.py`	Pydantic settings and fail-fast environment validation
`src/schema.py`	Wire-facing Pydantic request, command, and callback models
`src/usecase.py`	Application orchestration without framework ownership
`src/wiring.py`	Construction of concrete adapters and use cases
`src/adapters/`	HTTP, MQ, storage, LLM, browser, or external-system adapters
`src/domain/`	Pure data shaping, prompts, rendering models, or domain transforms
`src/observability/`	Service-local metric names and tracing wrapper
`tests/`	Use-case and observability tests that run without external services

Integration Boundaries

The worker repo is intentionally split across two interaction styles:

grading-svc and report-svc are RabbitMQ consumers. They subscribe to a command queue, process a job, and publish a callback payload back to the lumie.commands exchange.
grading-svc also keeps a synchronous HTTP grading endpoint for backend paths that still call the worker directly for one uploaded OMR image.
analysis-svc and chatbot-svc are HTTP services. They are called directly by another application layer and return a response on the same request path.

Workers do not call each other. Shared backend-facing calls go through httpx adapters that sign every /internal/** request with X-Tenant-Slug, X-Signature, and X-Timestamp.

Backend Ownership Rule

Workers can be CPU-heavy, IO-heavy, or AI-heavy, but backend ownership remains central:

The database edge stays on the backend side. If a worker needs product data, it gets that data from backend /internal/** APIs or from a command payload. If it produces product state, it returns a callback or response for backend-owned handlers to persist.

Reliability And Observability

Shared MQ jobs use libs/common/mq.py::universal_process, which standardizes JSON parsing, retries, callback handling, and ack or nack behavior.
Shared callback publishing lives in libs/common/callback_mq.py, which keeps one robust RabbitMQ connection open for the service lifetime.
Prometheus metrics are mounted at /metrics for every documented worker.
OpenTelemetry is enabled per service with OTEL_ENABLED, OTEL_ENDPOINT, and OTEL_SERVICE_NAME.

grading-svc, report-svc, and analysis-svc use the shared tracing helper in libs/common/observability.py. chatbot-svc uses a separate tracing setup so LangGraph and LangChain spans can be emitted without breaking its chat model client wiring.

Failure Semantics

Surface	Malformed input	Processing failure	Callback failure
RabbitMQ workers	Reject without requeue so the broker DLQ can capture it	`nack(requeue=True)` for retryable failures through `universal_process`	`nack(requeue=True)` after callback retry exhaustion
Sync HTTP routes	FastAPI validation error or explicit `HTTPException`	HTTP error response	Not applicable
LLM HTTP routes	FastAPI validation error	Upstream API errors propagate through handler failure path	Not applicable
SSE chatbot route	Request validation before stream	Error event or closed stream depending on failure point	Not applicable

This means queue consumers should treat callback publishing as part of the unit of work. A job is not complete until backend has received the success or failure payload.

Configuration Pattern

Every service reads environment through Pydantic settings. Defaults are allowed for local-only values such as localhost RabbitMQ, but production-critical secrets and base URLs should fail at startup when missing. Examples:

analysis-svc requires LLM_API_KEY.
report-svc requires LUMIE_BACKEND_URL and LUMIE_INTERNAL_HMAC_SECRET.
queue workers default their queue and prefetch values in code, then override them from deployment values.

Choosing The Right Page

Read Grading for the OMR queue contract, MinIO and Redis usage, and callback lifecycle.
Read Report for report rendering, backend data fan-out, and the RabbitMQ plus sync fallback surfaces.
Read Analysis for the two LLM-backed HTTP endpoints and prompt-driven generation flow.
Read Chatbot for LangGraph state, SSE streaming, human confirmation, and persistence behavior.

What Lives In The Worker Repo​

Source Paths​

Shared Runtime Pattern​

Service Layout Contract​

Integration Boundaries​

Backend Ownership Rule​

Reliability And Observability​

Failure Semantics​

Configuration Pattern​

Choosing The Right Page​