UVA RC Architecture Brief¶

This brief is intended for discussion with UVA Research Computing about where a PrivSyn prototype should live, which parts can be self-managed, and which parts would require RC-administered infrastructure.

Recommendation¶

Use a staged approach rather than asking RC to bless a full public service on day one.

This should also be framed as four separate delivery layers:

Python library / CLI
Internal web interface
Full managed internal system
Broader / external service

The meeting should ask which of these layers RC can support, and where the boundary lies between self-service and RC-administered infrastructure.

Delivery-Layer Decision Frame¶

Layer	Main question for RC	Likely owner
1. Library / CLI	Can this stay group-managed with RC compute + storage only?	Research group
2. Internal web interface	Does RC have a supported home for a small browser-facing service?	Shared between group and RC
3. Full managed internal system	Which identity, storage, and persistence services are standard?	RC plus campus IT / IAM
4. Broader / external service	Does RC want to host this at all, and under what controls?	Needs explicit operating owner

That makes it easier to ask for permission layer by layer instead of presenting RC with a single all-or-nothing deployment request.

Phase 0: Group-owned prototype on RC¶

Execution: Slurm jobs on Rivanna or another RC-managed compute resource.
Interface: CLI and Jupyter workflow, optionally an internal-only lightweight web UI.
Storage: project storage for example inputs and synthesized outputs.
Ownership: mostly self-service inside the research group.

This phase proves the workflow and resource profile without requiring RC to operate a campus-facing service.

This is also where a second prototype library that is based on older dependencies can still be useful: even if it is more complex operationally, it can be described as another Python-library / CLI path rather than as a full service requirement.

Phase 1: RC-hosted microservice for prototype demo¶

Frontend/API: small containerized service in RC container services / microservice hosting.
Compute: backend submits synthesis jobs to Slurm.
Storage: project storage or RC-approved persistent storage for datasets and outputs.
Metadata: SQLite for prototype-scale usage, or Postgres if RC already has a preferred managed option.
Access: initially campus-only or limited-access, then expanded if needed.

This is the most practical target for the current prototype.

Phase 2: Broader UVA service¶

Add UVA SSO / NetBadge integration.
Add quotas, audit logging, and clearer ownership boundaries.
Decide whether RC staff must administer shared service accounts, ingress, or storage.

Proposed Technical Shape¶

Web/API tier¶

FastAPI backend plus lightweight frontend.
Containerized so the same image can run locally, on cloud infrastructure, or in RC microservices.
Current codebase already separates:
job metadata persistence,
artifact storage,
synthesis execution backend.

Execution tier¶

Preferred RC model: submit sbatch jobs to Rivanna from the service layer.
The service should not do heavy synthesis work inside the web process.
Long-running work should be isolated from the web tier for scheduling, retries, and quota control.
The web tier and Slurm worker need a shared metadata/artifact location so the worker can update job state and publish outputs back to the service.

Storage tier¶

Input preview bundles, generated CSVs, and run metadata should persist outside process memory.
Prototype options:
project storage plus SQLite,
project storage plus Postgres if RC has a preferred service,
object-storage-like service if RC offers one.

Self-Service vs RC-Administered¶

Area	Likely self-service	Likely RC-administered / coordinated
PrivSyn code and container image	Yes	No
Slurm job scripts and resource requests	Yes	No, unless special queues/policies apply
Project storage directory	Usually yes, once allocated	RC provides/approves storage resource
Microservice container deployment	Possibly, if RC has a self-service path	Often yes for ingress, platform setup, or approvals
Public or campus web ingress	Rarely	Yes
DNS / TLS / load balancer	No	Yes
Shared service identity for submitting jobs	Unclear	Likely yes
NetBadge / SSO integration	No	Yes, with campus IAM involvement
Security review for uploaded data	No	Yes, with RC + InfoSec

Permissions and Resources To Ask About¶

Microservice / container hosting
Can RC host a small containerized web service?
Is there a self-service microservice path, or does RC staff deploy and manage ingress?
Slurm execution from a service
Can a web-facing service submit jobs to Slurm?
Must jobs run as the requesting user, or can a designated project/service identity submit them?
Are sbatch, squeue, and scancel available from the hosting environment?
Storage
Can the prototype use existing project storage for inputs/outputs?
Is persistent storage available for the service itself?
Is there an RC-preferred database option, or should prototype metadata stay in SQLite?
Identity / access
For a campus-facing service, what is the preferred UVA SSO path?
For a prototype, can access be limited to a small allowlist or internal-only route first?
Operations boundary
What can the research group own and update directly?
What would require RC staff to administer or approve?

Minimal Viable RC Prototype¶

This is the configuration I would recommend proposing in the meeting:

RC-hosted microservice for the lightweight web/API tier.
Slurm-backed execution for synthesis jobs.
Project storage for inputs and outputs.
SQLite for prototype metadata unless RC already has a standard Postgres path.
Limited user audience at first: project team or selected UVA researchers.
No public anonymous access in the first stage.

That keeps the operational burden low while still demonstrating the core collaboration and deployment story for the proposal.

Why This Fits The Current Codebase¶

The repository is already being refactored toward:

persistent preview bundles,
durable job records,
pluggable job runners,
swappable storage backends.

That means the same application logic can later target:

inline local execution,
Cloud Run jobs,
AWS Batch / ECS,
Slurm-backed RC execution.

Open Questions For The Meeting¶

Which RC platform is the right home for the web/API tier: microservices, Open OnDemand, or another service?
Can a service submit Slurm jobs directly, and under whose identity?
What parts of the stack can the project manage without RC staff as day-to-day operators?
Is project storage sufficient for the prototype, or is another persistent store preferred?
Does RC want the initial deployment to stay campus-internal rather than broadly exposed?

Keys	Action
`?`	Open this help
`n`	Next page
`p`	Previous page
`s`	Search