UVA RC Architecture Brief¶
This brief is intended for discussion with UVA Research Computing about where a PrivSyn prototype should live, which parts can be self-managed, and which parts would require RC-administered infrastructure.
Recommendation¶
Use a staged approach rather than asking RC to bless a full public service on day one.
This should also be framed as four separate delivery layers:
- Python library / CLI
- Internal web interface
- Full managed internal system
- Broader / external service
The meeting should ask which of these layers RC can support, and where the boundary lies between self-service and RC-administered infrastructure.
Delivery-Layer Decision Frame¶
| Layer | Main question for RC | Likely owner |
|---|---|---|
| 1. Library / CLI | Can this stay group-managed with RC compute + storage only? | Research group |
| 2. Internal web interface | Does RC have a supported home for a small browser-facing service? | Shared between group and RC |
| 3. Full managed internal system | Which identity, storage, and persistence services are standard? | RC plus campus IT / IAM |
| 4. Broader / external service | Does RC want to host this at all, and under what controls? | Needs explicit operating owner |
That makes it easier to ask for permission layer by layer instead of presenting RC with a single all-or-nothing deployment request.
Phase 0: Group-owned prototype on RC¶
- Execution: Slurm jobs on Rivanna or another RC-managed compute resource.
- Interface: CLI and Jupyter workflow, optionally an internal-only lightweight web UI.
- Storage: project storage for example inputs and synthesized outputs.
- Ownership: mostly self-service inside the research group.
This phase proves the workflow and resource profile without requiring RC to operate a campus-facing service.
This is also where a second prototype library that is based on older dependencies can still be useful: even if it is more complex operationally, it can be described as another Python-library / CLI path rather than as a full service requirement.
Phase 1: RC-hosted microservice for prototype demo¶
- Frontend/API: small containerized service in RC container services / microservice hosting.
- Compute: backend submits synthesis jobs to Slurm.
- Storage: project storage or RC-approved persistent storage for datasets and outputs.
- Metadata: SQLite for prototype-scale usage, or Postgres if RC already has a preferred managed option.
- Access: initially campus-only or limited-access, then expanded if needed.
This is the most practical target for the current prototype.
Phase 2: Broader UVA service¶
- Add UVA SSO / NetBadge integration.
- Add quotas, audit logging, and clearer ownership boundaries.
- Decide whether RC staff must administer shared service accounts, ingress, or storage.
Proposed Technical Shape¶
Web/API tier¶
- FastAPI backend plus lightweight frontend.
- Containerized so the same image can run locally, on cloud infrastructure, or in RC microservices.
- Current codebase already separates:
- job metadata persistence,
- artifact storage,
- synthesis execution backend.
Execution tier¶
- Preferred RC model: submit
sbatchjobs to Rivanna from the service layer. - The service should not do heavy synthesis work inside the web process.
- Long-running work should be isolated from the web tier for scheduling, retries, and quota control.
- The web tier and Slurm worker need a shared metadata/artifact location so the worker can update job state and publish outputs back to the service.
Storage tier¶
- Input preview bundles, generated CSVs, and run metadata should persist outside process memory.
- Prototype options:
- project storage plus SQLite,
- project storage plus Postgres if RC has a preferred service,
- object-storage-like service if RC offers one.
Self-Service vs RC-Administered¶
| Area | Likely self-service | Likely RC-administered / coordinated |
|---|---|---|
| PrivSyn code and container image | Yes | No |
| Slurm job scripts and resource requests | Yes | No, unless special queues/policies apply |
| Project storage directory | Usually yes, once allocated | RC provides/approves storage resource |
| Microservice container deployment | Possibly, if RC has a self-service path | Often yes for ingress, platform setup, or approvals |
| Public or campus web ingress | Rarely | Yes |
| DNS / TLS / load balancer | No | Yes |
| Shared service identity for submitting jobs | Unclear | Likely yes |
| NetBadge / SSO integration | No | Yes, with campus IAM involvement |
| Security review for uploaded data | No | Yes, with RC + InfoSec |
Permissions and Resources To Ask About¶
- Microservice / container hosting
- Can RC host a small containerized web service?
-
Is there a self-service microservice path, or does RC staff deploy and manage ingress?
-
Slurm execution from a service
- Can a web-facing service submit jobs to Slurm?
- Must jobs run as the requesting user, or can a designated project/service identity submit them?
-
Are
sbatch,squeue, andscancelavailable from the hosting environment? -
Storage
- Can the prototype use existing project storage for inputs/outputs?
- Is persistent storage available for the service itself?
-
Is there an RC-preferred database option, or should prototype metadata stay in SQLite?
-
Identity / access
- For a campus-facing service, what is the preferred UVA SSO path?
-
For a prototype, can access be limited to a small allowlist or internal-only route first?
-
Operations boundary
- What can the research group own and update directly?
- What would require RC staff to administer or approve?
Minimal Viable RC Prototype¶
This is the configuration I would recommend proposing in the meeting:
- RC-hosted microservice for the lightweight web/API tier.
- Slurm-backed execution for synthesis jobs.
- Project storage for inputs and outputs.
- SQLite for prototype metadata unless RC already has a standard Postgres path.
- Limited user audience at first: project team or selected UVA researchers.
- No public anonymous access in the first stage.
That keeps the operational burden low while still demonstrating the core collaboration and deployment story for the proposal.
Why This Fits The Current Codebase¶
The repository is already being refactored toward:
- persistent preview bundles,
- durable job records,
- pluggable job runners,
- swappable storage backends.
That means the same application logic can later target:
- inline local execution,
- Cloud Run jobs,
- AWS Batch / ECS,
- Slurm-backed RC execution.
Open Questions For The Meeting¶
- Which RC platform is the right home for the web/API tier: microservices, Open OnDemand, or another service?
- Can a service submit Slurm jobs directly, and under whose identity?
- What parts of the stack can the project manage without RC staff as day-to-day operators?
- Is project storage sufficient for the prototype, or is another persistent store preferred?
- Does RC want the initial deployment to stay campus-internal rather than broadly exposed?