Deployment Guide

This document outlines the most common deployment paths for the PrivSyn web application—local Docker usage, Google Cloud Run, and Vercel (frontend). It highlights the environment variables you need to set and the expected directory layout.

1. Local Docker Image

  1. Build the image: bash docker build -t privsyn-tabular .
  2. Run the container, exposing the FastAPI port: bash docker run --rm -p 8080:8080 privsyn-tabular
  3. Open http://localhost:8080. The image now builds the React frontend into web_app/static/, and the backend serves those assets directly.
  4. The backend listens on $PORT (defaults to 8080 for Cloud Run compatibility). If you want the bundled frontend to call a different backend, set VITE_API_BASE_URL as a Docker build arg: bash docker build --build-arg VITE_API_BASE_URL="https://api.example.com" -t privsyn-tabular .
  5. docker-compose.yml is also checked in for local prototype runs that want persisted jobs/ and temp_synthesis_output/ mounts.

2. Google Cloud Run

  1. Build and push the container: bash gcloud builds submit --tag gcr.io/<PROJECT_ID>/privsyn
  2. Deploy to Cloud Run: bash gcloud run deploy privsyn \ --image gcr.io/<PROJECT_ID>/privsyn \ --platform managed \ --allow-unauthenticated
  3. Configure Cloud Run service variables:
  4. ADDITIONAL_CORS_ORIGINS: optional comma-separated list of additional origins to append to the defaults in web_app/main.py.
  5. CORS_ALLOW_ORIGINS: deprecated alias kept for backward compatibility.

If you serve the frontend from the same Cloud Run container, the bundled app defaults to same-origin API calls and does not need VITE_API_BASE_URL.

Free tier constraints: the Cloud Run free tier grants limited CPU, RAM, and request duration. Large uploads or AIM runs frequently exceed those caps, leading to timeouts or OOM restarts. For heavy workloads, pull the code locally (or to a beefier VM) and run the backend outside Cloud Run.

2.1 Continuous deployment from GitHub Actions

Automated deployments run via .github/workflows/deploy-cloudrun.yml whenever main is updated. The workflow:

  1. Checks out the repository.
  2. Authenticates to GCP using GCP_SA_KEY (a JSON service-account key stored as a GitHub secret).
  3. Builds and pushes gcr.io/gen-lang-client-0649776758/privsyn-tabular via gcloud builds submit.
  4. Deploys the image to Cloud Run in us-east4 with --allow-unauthenticated.

To keep it working you must ensure the following GitHub secrets are defined:

Secret Value
GCP_SA_KEY Service-account JSON with roles Cloud Run Admin, Cloud Build Editor, Service Account User.

The project (gen-lang-client-0649776758), region (us-east4), and image name are baked into the workflow. If you need a different target, update .github/workflows/deploy-cloudrun.yml.

3. Vercel Frontend + Hosted Backend

  1. Deploy the backend (Docker, Cloud Run, or elsewhere) and note the public base URL.
  2. In the Vercel project (or other static hosting provider):
  3. Set VITE_API_BASE_URL to the backend URL.
  4. Run npm run build to produce frontend/dist.
  5. Serve the built assets or configure Vercel to use the static output directory.
  6. Set ADDITIONAL_CORS_ORIGINS on the backend to include the Vercel production and preview domains.

4. Environment Variables

Variable Purpose Default
VITE_API_BASE_URL Frontend build-time backend URL override. Separate frontend deploys only. same-origin outside local Vite dev
ADDITIONAL_CORS_ORIGINS Optional extra origins; comma separated.
CORS_ALLOW_ORIGINS Deprecated alias for ADDITIONAL_CORS_ORIGINS.
PORT FastAPI listening port (Cloud Run). 8080
LOG_LEVEL Optional override for FastAPI logging level. INFO
PRIVSYN_AUTH_BACKEND Request auth mode (none or trusted-header). none
PRIVSYN_METADATA_BACKEND Durable metadata backend selection (sqlite or postgres). sqlite
PRIVSYN_DATABASE_URL SQL connection URL for metadata (sqlite:///... or postgresql://...).
PRIVSYN_OBJECT_STORAGE_BACKEND Artifact backend (local or s3). local
PRIVSYN_OBJECT_STORAGE_BUCKET Bucket name for S3-compatible artifact storage.
PRIVSYN_OBJECT_STORAGE_ENDPOINT_URL Optional MinIO / Ceph / S3-compatible endpoint URL.
PRIVSYN_OBJECT_STORAGE_FORCE_PATH_STYLE Set to true for MinIO and many campus object stores. false

5. UVA Research Computing / Rivanna

Rivanna is a good target for batch-backed synthesis execution. The repo includes RC-focused scaffolding under deploy/rc/, plus Kubernetes manifests and a handoff checklist for cluster coordination.

Validated RC-facing settings from the prototype deployment path:

export EXECUTION_MODE=slurm
export JOBS_ROOT=/scratch/$USER/privsyn/jobs
export SLURM_ACCOUNT=dplab
export SLURM_PARTITION=standard

If the web/API service is not running on Rivanna itself, set one of:

export SLURM_SSH_TARGET=nkp2mr@login.hpc.virginia.edu

or

export SLURM_SSH_USER=nkp2mr
export SLURM_SSH_HOST=login.hpc.virginia.edu

When either SSH setting is present, the RC prototype executor keeps EXECUTION_MODE=slurm but submits over SSH. Configure these as well:

export SLURM_REMOTE_PROJECT_ROOT=/home/nkp2mr/privsyn-tabular-rc
export SLURM_REMOTE_JOBS_ROOT=/scratch/nkp2mr/privsyn/jobs
export SLURM_REMOTE_RUNNER_COMMAND="/home/nkp2mr/miniconda3/envs/privsyn-rc/bin/python -m web_app.job_runner"

See deploy/rc/README.md, docs/nsf-prototype-deployment.md, and docs/rc-handoff-checklist.md for the RC-specific path.

6. Storage & Sessions

  • Temporary artifacts (uploaded parquet files, synthesized CSVs) are stored under temp_synthesis_output/runs/{session_id}.
  • RC prototype job-mode runs also persist metadata, logs, inputs, and outputs under jobs/{job_id}/.
  • Sessions expire automatically after six hours. For production, consider pointing the session store to Redis or another durable cache.
  • Use a cron or Cloud Run job to prune temp_synthesis_output if you retain disk between deployments.
  • Shared deployments can now move job metadata off local disk by setting PRIVSYN_METADATA_BACKEND=postgres with a postgresql://... database URL. The backend normalizes this to the psycopg SQLAlchemy driver automatically.
  • Internal deployments can now enforce per-user access by setting PRIVSYN_AUTH_BACKEND=trusted-header and having the ingress/proxy inject X-Privsyn-Subject; optional headers X-Privsyn-Email, X-Privsyn-Name, and X-Privsyn-Admin are also supported.

7. Health Checks

  • Use GET /healthz for a lightweight ping.
  • To simulate the metadata flow without a real upload, POST to /synthesize with dataset_name=debug_dataset (returns stub metadata).

Refer to docs/testing.md for CI-friendly commands to verify the deployment image before shipping.