Deployment Guide¶
This document outlines the most common deployment paths for the PrivSyn web application—local Docker usage, Google Cloud Run, and Vercel (frontend). It highlights the environment variables you need to set and the expected directory layout.
1. Local Docker Image¶
- Build the image:
bash docker build -t privsyn-tabular . - Run the container, exposing the FastAPI port:
bash docker run --rm -p 8080:8080 privsyn-tabular - Open
http://localhost:8080. The image now builds the React frontend intoweb_app/static/, and the backend serves those assets directly. - The backend listens on
$PORT(defaults to8080for Cloud Run compatibility). If you want the bundled frontend to call a different backend, setVITE_API_BASE_URLas a Docker build arg:bash docker build --build-arg VITE_API_BASE_URL="https://api.example.com" -t privsyn-tabular . docker-compose.ymlis also checked in for local prototype runs that want persistedjobs/andtemp_synthesis_output/mounts.
2. Google Cloud Run¶
- Build and push the container:
bash gcloud builds submit --tag gcr.io/<PROJECT_ID>/privsyn - Deploy to Cloud Run:
bash gcloud run deploy privsyn \ --image gcr.io/<PROJECT_ID>/privsyn \ --platform managed \ --allow-unauthenticated - Configure Cloud Run service variables:
ADDITIONAL_CORS_ORIGINS: optional comma-separated list of additional origins to append to the defaults inweb_app/main.py.CORS_ALLOW_ORIGINS: deprecated alias kept for backward compatibility.
If you serve the frontend from the same Cloud Run container, the bundled app defaults to same-origin API calls and does not need VITE_API_BASE_URL.
Free tier constraints: the Cloud Run free tier grants limited CPU, RAM, and request duration. Large uploads or AIM runs frequently exceed those caps, leading to timeouts or OOM restarts. For heavy workloads, pull the code locally (or to a beefier VM) and run the backend outside Cloud Run.
2.1 Continuous deployment from GitHub Actions¶
Automated deployments run via .github/workflows/deploy-cloudrun.yml whenever main is updated. The workflow:
- Checks out the repository.
- Authenticates to GCP using
GCP_SA_KEY(a JSON service-account key stored as a GitHub secret). - Builds and pushes
gcr.io/gen-lang-client-0649776758/privsyn-tabularviagcloud builds submit. - Deploys the image to Cloud Run in
us-east4with--allow-unauthenticated.
To keep it working you must ensure the following GitHub secrets are defined:
| Secret | Value |
|---|---|
GCP_SA_KEY |
Service-account JSON with roles Cloud Run Admin, Cloud Build Editor, Service Account User. |
The project (gen-lang-client-0649776758), region (us-east4), and image name are baked into the workflow. If you need a different target, update .github/workflows/deploy-cloudrun.yml.
3. Vercel Frontend + Hosted Backend¶
- Deploy the backend (Docker, Cloud Run, or elsewhere) and note the public base URL.
- In the Vercel project (or other static hosting provider):
- Set
VITE_API_BASE_URLto the backend URL. - Run
npm run buildto producefrontend/dist. - Serve the built assets or configure Vercel to use the static output directory.
- Set
ADDITIONAL_CORS_ORIGINSon the backend to include the Vercel production and preview domains.
4. Environment Variables¶
| Variable | Purpose | Default |
|---|---|---|
VITE_API_BASE_URL |
Frontend build-time backend URL override. Separate frontend deploys only. | same-origin outside local Vite dev |
ADDITIONAL_CORS_ORIGINS |
Optional extra origins; comma separated. | — |
CORS_ALLOW_ORIGINS |
Deprecated alias for ADDITIONAL_CORS_ORIGINS. |
— |
PORT |
FastAPI listening port (Cloud Run). | 8080 |
LOG_LEVEL |
Optional override for FastAPI logging level. | INFO |
PRIVSYN_AUTH_BACKEND |
Request auth mode (none or trusted-header). |
none |
PRIVSYN_METADATA_BACKEND |
Durable metadata backend selection (sqlite or postgres). |
sqlite |
PRIVSYN_DATABASE_URL |
SQL connection URL for metadata (sqlite:///... or postgresql://...). |
— |
PRIVSYN_OBJECT_STORAGE_BACKEND |
Artifact backend (local or s3). |
local |
PRIVSYN_OBJECT_STORAGE_BUCKET |
Bucket name for S3-compatible artifact storage. | — |
PRIVSYN_OBJECT_STORAGE_ENDPOINT_URL |
Optional MinIO / Ceph / S3-compatible endpoint URL. | — |
PRIVSYN_OBJECT_STORAGE_FORCE_PATH_STYLE |
Set to true for MinIO and many campus object stores. |
false |
5. UVA Research Computing / Rivanna¶
Rivanna is a good target for batch-backed synthesis execution. The repo includes RC-focused scaffolding under deploy/rc/, plus Kubernetes manifests and a handoff checklist for cluster coordination.
Validated RC-facing settings from the prototype deployment path:
export EXECUTION_MODE=slurm
export JOBS_ROOT=/scratch/$USER/privsyn/jobs
export SLURM_ACCOUNT=dplab
export SLURM_PARTITION=standard
If the web/API service is not running on Rivanna itself, set one of:
export SLURM_SSH_TARGET=nkp2mr@login.hpc.virginia.edu
or
export SLURM_SSH_USER=nkp2mr
export SLURM_SSH_HOST=login.hpc.virginia.edu
When either SSH setting is present, the RC prototype executor keeps EXECUTION_MODE=slurm but submits over SSH. Configure these as well:
export SLURM_REMOTE_PROJECT_ROOT=/home/nkp2mr/privsyn-tabular-rc
export SLURM_REMOTE_JOBS_ROOT=/scratch/nkp2mr/privsyn/jobs
export SLURM_REMOTE_RUNNER_COMMAND="/home/nkp2mr/miniconda3/envs/privsyn-rc/bin/python -m web_app.job_runner"
See deploy/rc/README.md, docs/nsf-prototype-deployment.md, and docs/rc-handoff-checklist.md for the RC-specific path.
6. Storage & Sessions¶
- Temporary artifacts (uploaded parquet files, synthesized CSVs) are stored under
temp_synthesis_output/runs/{session_id}. - RC prototype job-mode runs also persist metadata, logs, inputs, and outputs under
jobs/{job_id}/. - Sessions expire automatically after six hours. For production, consider pointing the session store to Redis or another durable cache.
- Use a cron or Cloud Run job to prune
temp_synthesis_outputif you retain disk between deployments. - Shared deployments can now move job metadata off local disk by setting
PRIVSYN_METADATA_BACKEND=postgreswith apostgresql://...database URL. The backend normalizes this to thepsycopgSQLAlchemy driver automatically. - Internal deployments can now enforce per-user access by setting
PRIVSYN_AUTH_BACKEND=trusted-headerand having the ingress/proxy injectX-Privsyn-Subject; optional headersX-Privsyn-Email,X-Privsyn-Name, andX-Privsyn-Adminare also supported.
7. Health Checks¶
- Use
GET /healthzfor a lightweight ping. - To simulate the metadata flow without a real upload, POST to
/synthesizewithdataset_name=debug_dataset(returns stub metadata).
Refer to docs/testing.md for CI-friendly commands to verify the deployment image before shipping.