Multimodal Architecture

This project should evolve as a shared platform plus multiple modality-specific applications, not as one giant website that tries to present tabular, text, image, and future modalities through a single product surface.

Use one repository or monorepo with a clear split between platform code and modality apps:

repo/
  packages/
    privsyn_platform/        # auth, storage, jobs, ownership, modality routing contracts
    privsyn_tabular/         # tabular library + CLI
    privsyn_text/            # text library + CLI
    privsyn_image/           # image library + CLI
  apps/
    tabular_web/            # tabular UI + API
    text_web/               # text UI + API
    image_web/              # image UI + API
    hub_web/                # optional landing page / submission gateway
  workers/
    tabular_worker/
    text_worker/
    image_worker/
  deploy/
    shared/
    tabular/
    text/
    image/

If you prefer multiple repositories instead of a monorepo, keep the same conceptual split:

  • one shared privsyn_platform package
  • one app/service per modality
  • one worker/runtime environment per modality

What Should Be Shared

Keep these in privsyn_platform:

  • authentication and SSO integration
  • job creation, polling, cancellation, and ownership rules
  • durable metadata and object storage
  • deployment-facing settings and runner contracts
  • modality routing contracts and submission metadata

What Should Stay Modality-Specific

Do not force these into the shared layer:

  • model loading and inference code
  • GPU/runtime dependencies
  • prompt or dataset schemas
  • evaluation metrics
  • frontend workflows and result presentation

Tabular, text, and image products can share a login shell and a job-history page, but the generation forms and result views should remain separate.

UI Recommendation

Do not put every modality into one dense workflow page.

Prefer:

  • tabular.example.edu
  • text.example.edu
  • image.example.edu

Optionally add a thin hub such as studio.example.edu that:

  • lets the user choose a modality
  • shows recent jobs across apps
  • routes to the correct product

Automatic Modality Detection

Automatic detection is useful as a fallback, but it should not be the only control path.

Recommended rule:

  1. If the caller explicitly declares a modality, trust it.
  2. Otherwise infer modality from file type, content type, or request schema.
  3. If inputs span multiple modalities, classify the request as multimodal.
  4. If inference is ambiguous, return unknown and ask the caller or UI to choose.

The shared package now includes a starter contract in privsyn_platform.modality with:

  • Modality
  • InputDescriptor
  • infer_input_modality(...)
  • detect_modality(...)

This is enough to support a future hub or gateway without prematurely coupling all apps together.

Gateway Pattern

If you want a single submission endpoint later, make it a router, not a monolith.

Example flow:

  1. Client submits files and optional declared modality to hub_web.
  2. hub_web uses privsyn_platform.modality.detect_modality(...).
  3. The gateway creates a platform-level job record.
  4. The request is forwarded to the correct modality app or worker backend.
  5. Status and ownership stay consistent because they all use the same platform contracts.

That keeps the user-facing entry unified while still allowing:

  • different runtime images
  • different model dependencies
  • different GPU/CPU scheduling
  • different result UIs

Practical Next Step

When a new modality project starts:

  1. Build its library/CLI first.
  2. Reuse privsyn_platform for auth, jobs, storage, and ownership.
  3. Create a separate web app only if that modality needs one.
  4. Add the modality to a hub only after the standalone app works well.