Platform Roadmap¶

This note tracks the migration from the current single-process demo flow to a deployment model that can run on Google Cloud, AWS, or a university-managed cluster without rewriting the synthesis logic.

Goals¶

Keep the synthesis API and preprocessing pipeline cloud-agnostic.
Replace in-memory session state with durable metadata and artifact storage.
Support both public-cloud auth and campus SSO with the same backend claims model.
Let the same job abstraction target Cloud Run Jobs, ECS/Fargate or AWS Batch, and Slurm/Kubernetes Jobs.

Phase 1: Durable Foundations¶

Status: started

Add environment-driven settings (web_app/settings.py).
Add a metadata persistence layer (web_app/metadata_store.py).
Route metadata store construction through a backend factory so sqlite local dev and future Postgres deploys share the same call sites.
Add an object storage abstraction with local and S3-compatible backends (web_app/object_storage.py).
Keep the existing FastAPI flow unchanged while the new primitives harden under tests.
Persist preview/inference bundles so confirmation survives in-memory session loss.

Deliverables:

SQLite metadata store for local development.
Local artifact storage rooted under temp_synthesis_output/state/artifacts.
Tests that lock in user/job/artifact semantics.

Phase 2: Job Model¶

Status: in progress
Introduce explicit job states: queued, running, succeeded, failed, cancelled.
Convert /confirm_synthesis into job submission plus status polling.
Move synthesized CSVs and uploaded parquet files behind the object storage abstraction.
Keep inline execution as the local default backend to preserve the current dev UX.
Keep backend selection config-driven through PRIVSYN_JOB_BACKEND so routes do not have to change when slurm or cloud backends land.
Persist confirmed run bundles so remote workers can consume portable input artifacts rather than route-local temp state.
Treat queued backends as first-class job submissions: only inline-complete runs should populate the legacy in-memory session payload.

Phase 3: Auth Model¶

Add a normalized users table keyed by external subject (sub) and provider.
Accept OIDC-backed identity claims in the backend.
Map cloud auth providers and campus SSO into the same internal user record.
Add per-job ownership checks before artifact download and evaluation.

Phase 4: Platform Adapters¶

Google Cloud¶

Web/API: Cloud Run
Jobs: Cloud Run Jobs
Object storage: Google Cloud Storage
Database: Cloud SQL Postgres or external Postgres
Auth: Google login, Clerk, or another OIDC provider

AWS¶

Web/API: App Runner
Jobs: ECS/Fargate or AWS Batch
Object storage: S3
Database: RDS Postgres
Auth: Cognito or another OIDC provider

University-managed deployment¶

Web/API: campus VM or Kubernetes ingress
Jobs: Slurm or Kubernetes Jobs
Object storage: MinIO, Ceph, or shared storage behind the object storage interface
Database: campus Postgres
Auth: campus SSO via OIDC or SAML-to-OIDC bridge

Integration Rules¶

Business logic should not import cloud-specific SDKs directly.
Storage code should depend on storage adapters, not filesystem paths.
Job submission should go through one backend interface, even for inline local runs.
Authenticated user identity should enter the synthesis flow as a normalized user record, not as provider-specific fields.

Safe Rollout and Rollback¶

Introduce each new subsystem as a dual-write or read-fallback layer first.
Keep SessionStore and current local run directories working until metadata-backed paths are proven in tests.
Preserve current API response shapes while adding job and artifact metadata behind the scenes.
Only remove legacy state paths after one full release cycle of stable tests and manual validation.

University Deployment Checklist¶

Public entrypoint: confirm whether campus IT will host a VM, reverse proxy, or Kubernetes ingress.
Job submission: confirm whether web services may submit to Slurm or another scheduler.
Identity: confirm OIDC or SAML app registration path.
Data services: confirm Postgres and object storage availability.
Security review: confirm whether user-uploaded datasets require privacy or compliance review.

Immediate Next Steps¶

Persist preview/inference artifacts so remote runners do not depend on SessionStore.
Persist scheduler-side diagnostics (exit code, stderr pointer, submission host) back into job metadata.
Add a GCS adapter and production Postgres deployment path behind the existing storage and metadata interfaces.
Layer per-user ownership and auth checks on top of the durable job/artifact APIs.
Add a higher-level deployment guide that compares local, Slurm, AWS Batch, and Cloud Run setup requirements side by side.

Keys	Action
`?`	Open this help
`n`	Next page
`p`	Previous page
`s`	Search