Goal

Public-facing customer mail stack. Per-customer. Auto-provisioned via existing EnsureCustomerVms / EnsureProxmoxPostInstall pipeline. Inbound + outbound
webmail. TLS forced everywhere. Test target: Liberator (192.168.1.251, root / default4244$). Test domain: test01.iam.free. Manual send/receive verified against info@osisa.com.

Scope

In scope

  • PBS (Proxmox Backup Server) installed on Proxmox host. Datastore enabled for pull from other machines.

  • PMG (Proxmox Mail Gateway) as LXC container pmg@<hostSerial> on the Liberator. Inbound filter, relays to mail VM port 26.

  • Mail VM (NixOS) mail@<hostSerial>: postfix (25 + 587), dovecot2 (143 + 993 + lmtp), nginx + roundcube.

  • DNS VM (NixOS) dns@<hostSerial>: coredns authoritative, step-ca internal CA, lego (LE DNS-01 via configured provider), ddns-updater (configured provider).

  • Per-customer MailRelayOptions POCO on Customer model.

  • TestCompany.cs additions: mail@lib, dns@lib, pmg@lib.

  • Automated unit tests + Liberator e2e test ProxmoxPublicMailTests.Liberator.

Out of scope

  • rspamd-on-mail-VM fallback. Note as fallback path only; not implemented.

  • unbound (DNS). coredns first; switch later if it falls short.

  • GPO distribution of step-ca root cert to Windows clients. Later iteration.

  • DKIM signing enforcement. Keys generated + logged for manual DNS publication only. Smarthost relay expected to handle signing where set.

  • Edge router NAT / port-forward 25/80/443. Deployment prerequisite, not NETSetup code.

  • Automated send from info@osisa.com. User does this by hand.

Architecture

Liberator host layout

+--------------------------------------------------------------------+
| Proxmox VE host (Liberator, 192.168.1.251)                         |
|                                                                    |
|  +--------------------+   +-------------------------------------+  |
|  | PBS (apt pkg)      |   | LXC: pmg@lib                        |  |
|  | datastore at       |   |  Debian 12 + proxmox-mailgateway    |  |
|  | /var/lib/proxmox-  |   |  features=nesting=1, unprivileged   |  |
|  | backup/datastore   |   |  ports: 25 (in), 26 (out -> mail)   |  |
|  +--------------------+   +-------------------------------------+  |
|                                                                    |
|  +-------------------------------+   +--------------------------+  |
|  | VM: mail@lib (NixOS)          |   | VM: dns@lib (NixOS)      |  |
|  |  postfix 25 (smtpd, STARTTLS) |   |  coredns (authoritative) |  |
|  |  postfix 26 (LMTP from PMG)   |   |  step-ca (internal CA)   |  |
|  |  postfix 587 (submission)     |   |  lego (LE DNS-01)        |  |
|  |  dovecot 143/993/lmtp         |   |  ddns-updater            |  |
|  |  nginx + roundcube (443)      |   |    (configured provider) |  |
|  +-------------------------------+   +--------------------------+  |
+--------------------------------------------------------------------+
        ^ 25/443                                ^ DNS pull (split)
        |                                       |
   +----+----+   edge NAT 25/80/443        +----+----+
   |  WAN    |  <----------------------->  | LAN     |
   +---------+                             +---------+
        ^                                       ^
        |                                       |
   internet senders                       internal clients
   (MX -> public IP)                      (resolver -> dns@lib)

Public flow

Outbound mail:

client (MUA)
   --(587, SUBMISSION, STARTTLS + SASL)-->
mail VM postfix
   --(if MailRelayOptions set)-->  smarthost (STARTTLS + SASL)  --> internet
   --(else)-->                     direct MX lookup              --> internet

Inbound mail:

internet sender
   --(MX lookup -> mx.<domain> -> A record -> current public IP)-->
edge router (NAT 25)
   --(forward 25)-->
PMG LXC postfix
   --(filter, SpamAssassin/ClamAV)-->
   --(LMTP/SMTP port 26)-->
mail VM postfix
   --(LMTP local socket)-->
dovecot
   --(maildir / sdbox)-->
mailbox

Webmail:

user browser
   --(https://mail.<domain>, LE cert)-->
edge router (NAT 443)
   --(forward 443)-->
mail VM nginx
   --(php-fpm + roundcube)-->
   --(IMAP local 143 STARTTLS)-->
dovecot

Components

PBS (host)

  • Install: apt install proxmox-backup-server on Proxmox host directly. Officially supported coexistence with PVE.

  • Datastore: auto-create at /var/lib/proxmox-backup/datastore if missing.

  • Datastore is enabled (pull-target). Other machines push backups in.

  • New stage: EnsurePbsInstalled runs from EnsureProxmoxPostInstall on hosts that opt-in (every Liberator does in v1).

  • No new VM. No new container. Host service only.

PMG LXC

  • Container name: pmg@<hostSerial>.

  • Provisioner: ProvisionPmgLxc (new). Distinct from ProvisionNixosVm / ProvisionWindowsVmOnProxmox because it is pct not qm.

  • Base: Debian 12 LXC template from /var/lib/vz/template/cache/.

  • Required pct flags:

    • --features nesting=1 → PMG needs nested namespaces for SpamAssassin helper sandboxes.

    • --unprivileged 1 → still acceptable; no PMG component requires root on host.

    • AppArmor: default lxc-default-cgns profile; loosen only if PMG fails to start. Document override in feature memory if hit.

  • Post-create install: PMG apt repo + key + apt install proxmox-mailgateway. Run inside LXC via pct exec.

  • PMG config: minimal. Inbound on 25, relay-to mail.<domain>:26. Default rule sets fine for v1.

  • TLS: PMG uses LE cert for pmg.<domain> issued by DNS VM lego. Cert pulled to LXC at first boot via SSH from DNS VM.

Mail VM (NixOS)

  • VM name: mail@<hostSerial>. Routes to new MailHostTemplate via NixHostTemplateFactory (DeviceTypes.Server + Description "Mail").

  • Modules added under src/NETSetup/nixos/modules/services/:

    • mail-postfix.nixhost.services.mail.postfix

    • mail-dovecot.nixhost.services.mail.dovecot

    • mail-roundcube.nixhost.services.mail.roundcube

  • Postfix config:

    • smtpd_tls_security_level=encrypt on 25 (server-to-server STARTTLS).

    • smtpd_tls_security_level=encrypt on 587 (submission), smtpd_sasl_auth_enable=yes, smtpd_sasl_type=dovecot.

    • Internal port 26 listener, accepts mail only from PMG IP, no SASL, hands to dovecot LMTP via smtpd_recipient_restrictions.

    • relayhost set when host.services.mail.postfix.relay non-null. Smarthost uses STARTTLS + SASL (smtp_sasl_password_maps, smtp_sasl_security_options=noanonymous, smtp_tls_security_level=encrypt).

  • Dovecot config:

    • ssl=required, disable_plaintext_auth=yes.

    • Listeners: imap (143 STARTTLS), imaps (993), lmtp (unix socket /var/run/dovecot/lmtp).

    • Auth: PAM against local users for v1. SOPS-managed user/pass scope later.

  • Webmail: nginx vhost mail.<domain> → roundcube (php-fpm). Cert is the same LE cert mounted from the DNS VM cert distribution module.

  • DKIM: opendkim package installed. Keypair generated at first boot per domain. Public key emitted to journalctl -u opendkim-genkey AND written to /var/lib/opendkim/<domain>.txt for manual DNS publication. Signing is wired but the message goes out unsigned if the relay is set (smarthost re-signs).

MailRelayOptions wiring

NixOS option shape (mirrors C# POCO, see "Customer model additions"):

host.services.mail.postfix.relay = lib.mkOption {
  type = lib.types.nullOr (lib.types.submodule {
    options = {
      host     = lib.mkOption { type = lib.types.str; };
      port     = lib.mkOption { type = lib.types.port; default = 587; };
      user     = lib.mkOption { type = lib.types.str; };
      passFile = lib.mkOption { type = lib.types.path; };
    };
  });
  default = null;
};

When null: postfix does direct MX lookup. When set: postfix routes everything outbound through that relay. Per-customer (one relay shared by all VMs of the customer; in practice only the mail VM cares).

DNS VM (NixOS)

  • VM name: dns@<hostSerial>. Routes to new DnsHostTemplate via NixHostTemplateFactory (DeviceTypes.Server + Description "Dns").

  • Modules added under src/NETSetup/nixos/modules/services/:

    • dns-coredns.nixhost.services.dns.coredns

    • dns-stepca.nixhost.services.dns.stepca

    • dns-lego.nixhost.services.dns.lego (LE DNS-01 via configured provider; lego speaks ~80 providers, see lego docs)

    • Reuse existing ddns-updater.nix (already in repo; nixpkgs services.ddns-updater supports Cloudflare, DuckDNS, Dyn, NoIP, Gandi, GoDaddy, Hetzner, Namecheap, OVH, Porkbun, and more — any provider with a usable HTTP API).

  • coredns:

    • Authoritative for <customerDomain> (e.g. test01.iam.free).

    • Split-horizon: serves internal A records pointing to LAN IPs of mail VM and PMG LXC; external resolution stays at the configured DNS provider (Cloudflare, Porkbun, Namecheap, …​ — per customer).

    • TTL on all records: 60 s (DDNS-driven IP changes).

  • step-ca:

    • Internal CA. Issues certs for non-public services.

    • Root cert published to a known path; later wiring distributes to NixOS trust store on every customer VM and to GPO.

  • lego:

    • DNS-01 challenge against the configured DNS provider (per customer; lego speaks ~80 providers natively — Cloudflare, Porkbun, Namecheap, OVH, Hetzner, Gandi, GoDaddy, DuckDNS, …​). Provider + credentials flow in from Customer.DnsProvider / Customer.DnsCredentials (see "Customer model additions"). 95% of the time same provider + credentials as ddns-updater (same DNS, same API).

    • Issues certs for mail.<domain> and pmg.<domain>.

    • Cert distribution: lego writes cert + key to a shared path; NixOS module on mail VM and PMG LXC pulls via SSH on a timer (or systemd path unit).

    • TESTING: lego --server flag points to LE staging by default in test builds to avoid LE rate limits. Production flips to prod endpoint.

  • ddns-updater:

    • Existing nixpkgs module already wrapped at src/NETSetup/nixos/modules/services/ddns-updater.nix.

    • Pushes current customer site public IP to the configured DNS provider’s API on change (Cloudflare shown as example; provider chosen per customer via Customer.DnsProvider).

    • Records updated:

      • mail.<domain> A → public IP

      • pmg.<domain> A → public IP

      • mx.<domain> A → public IP

      • <domain> MX → mx.<domain> (priority 10)

    • All TTLs 60 s.

Configuration

Customer model additions

POCO under osisa.Enterprise.Entities (or NETSetup-side wrapper if the entity lives in netbase and we cannot extend there yet):

public sealed class MailRelayOptions
{
    public string Host { get; init; }
    public int Port { get; init; } = 587;
    public string User { get; init; }
    public AbsolutePath PassFile { get; init; }
}
  • Nullable on Customer: MailRelayOptions Relay { get; init; }.

  • Per-customer (one relay per Customer instance).

  • PassFile is a SOPS-encrypted secret path; resolved on the mail VM via the existing SOPS-nix wiring. Builder code does not read the file.

DNS provider config (drives both ddns-updater and lego DNS-01):

public sealed class DnsProviderOptions
{
    // Provider key: "cloudflare", "duckdns", "porkbun", "namecheap",
    // "gandi", "godaddy", "hetzner", "ovh", "noip", "dyn", ...
    // Must match a provider supported by both nixpkgs
    // services.ddns-updater AND lego (95% overlap; pick one in the
    // intersection).
    public string Provider { get; init; }

    // Provider-specific credential dict. Keys depend on the provider:
    //   cloudflare -> { "CF_API_TOKEN": "<sops-path>" }
    //   duckdns    -> { "token":       "<sops-path>" }
    //   namecheap  -> { "username":    "<plain>",
    //                   "apiKey":      "<sops-path>" }
    //   porkbun    -> { "apiKey":      "<sops-path>",
    //                   "secretKey":   "<sops-path>" }
    // Values that are secrets MUST be SOPS-encrypted secret paths (same
    // pattern as MailRelayOptions.PassFile). Plain-text values (usernames,
    // zone IDs, customer numbers) are inlined.
    public IReadOnlyDictionary<string, string> Credentials { get; init; }
}
  • Nullable on Customer: DnsProviderOptions DnsProvider { get; init; }.

  • When null: no public DNS automation. Lego falls back to HTTP-01 (port 80 must be reachable), ddns-updater is disabled.

  • When set: ddns-updater + lego both consume the same provider + credential dict on the DNS VM. Provider key is mapped to ddns-updater’s settings.providers.<n>.provider and to lego’s --dns <provider> flag; credentials are written into the per-service env file via SOPS.

  • Customer.DnsProvider.Provider and Customer.DnsProvider.Credentials are the only knobs the customer config touches; everything else (DNS-01 vs HTTP-01 routing, env var names) is derived inside the DNS VM module.

Per-VM hostname conventions

Host Public hostname Purpose

mail VM

mail.<domain>

submission + IMAP + webmail

PMG LXC

pmg.<domain>

PMG admin UI (8006) and inbound MX target

MX

mx.<domain>

MX target A record (points to public IP)

DNS

(internal only)

not published; LAN reachable

<domain> = Customer.Domain (new field on Customer / Company; for the test fixture this is test01.iam.free).

MX record format: <domain>. IN MX 10 mx.<domain>.

TLS / cert flow

Public certs (LE via DNS VM lego, DNS-01 against the configured DNS provider — Cloudflare shown as example, real provider is whatever Customer.DnsProvider.Provider says):

DNS VM lego (DNS-01, provider = Customer.DnsProvider.Provider)
   --> issues cert for mail.<domain>
   --> writes /var/lib/lego/certs/mail.<domain>.crt + .key
   --> systemd path unit triggers ssh-push to mail VM
   --> mail VM nginx + postfix + dovecot reload

DNS VM lego (DNS-01, same provider as above)
   --> issues cert for pmg.<domain>
   --> ssh-push to pmg LXC
   --> pmg-tls-reload

Internal certs (step-ca):

step-ca on DNS VM
   --> root cert at /var/lib/step-ca/certs/root_ca.crt
   --> NixOS module bakes root into security.pki.certificateFiles on every
       customer VM
   --> later: GPO distributes same root to Windows clients (out of scope v1)

Postfix TLS (forced):

  • Port 25: smtpd_tls_security_level = encrypt. Plain unencrypted SMTP rejected.

  • Port 587: smtpd_tls_security_level = encrypt, smtpd_sasl_auth_enable = yes, plaintext auth refused without TLS.

  • Outbound smarthost: smtp_tls_security_level = encrypt.

Dovecot TLS (forced):

  • ssl = required

  • disable_plaintext_auth = yes

  • Plain IMAP login over 143-without-STARTTLS rejected.

DDNS flow

Provider-agnostic. The actual API call shape depends on Customer.DnsProvider.Provider; ddns-updater abstracts it. Cloudflare is shown below as one concrete example — swap for DuckDNS / Porkbun / Namecheap / Hetzner / OVH / etc. by changing Customer.DnsProvider.

ddns-updater on DNS VM
   reads current public IP (whatismyip / opendns)
   compares with last-pushed IP (state file)
   if changed:
      <provider> API call (e.g. Cloudflare PATCH, DuckDNS GET,
                           Porkbun POST, Namecheap GET):
        mail.<domain>  A   -> new IP
        pmg.<domain>   A   -> new IP
        mx.<domain>    A   -> new IP
   sleep N seconds (default 300; lower for test)

Supported providers (intersection of nixpkgs services.ddns-updater and lego, picked per customer): Cloudflare, DuckDNS, Dyn, NoIP, Gandi, GoDaddy, Hetzner, Namecheap, OVH, Porkbun, and others — any provider with a usable HTTP API and a plugin in both tools.

Provider credentials:

  • Provided via Customer.DnsProvider.Credentials (string → string dict). Provider-specific keys: Cloudflare wants CF_API_TOKEN, DuckDNS wants token, Namecheap wants username + apiKey, Porkbun wants apiKey
    secretKey, etc.

  • Secret values stored as SOPS secrets on the DNS VM; non-secret values (usernames, zone IDs) inlined.

  • Customer-edge has dynamic IP, no static. ddns-updater is the bridge.

  • TTL on all touched records: 60 s. MX TTL too.

Test strategy

Unit

  • Template renderers:

    • MailHostTemplate → emits flake + hardware + default with mail modules enabled, relay options threaded through.

    • DnsHostTemplate → emits flake + hardware + default with dns modules enabled, configured DNS provider + credentials wired (provider-agnostic; secret credential values resolved to SOPS secret references, plain values inlined).

  • LxcCommand builder + ProvisionPmgLxc orchestrator:

    • pct create args (template, hostname, features, unprivileged, net, storage).

    • post-create script (apt repo add, install, basic config).

    • Idempotency (existing CTID → skip).

  • EnsurePbsInstalled:

    • apt install proxmox-backup-server (idempotent: skip if installed).

    • datastore-create (idempotent: skip if path exists with marker).

  • MailRelayOptions POCO:

    • null → postfix template emits no relayhost line.

    • set → postfix template emits relayhost = [host]:port plus sasl_password_maps.

E2E (automated, Liberator)

New test class: ProxmoxPublicMailTests.Liberator under src/NETSetup.Tests/.

Steps:

  1. SSH/SCP base image to Liberator. Run netsetup install on Liberator (same pattern as ProxmoxWithUSBStickTests.Liberator).

  2. EnsureProxmoxPostInstall runs PBS step → assert systemctl is-active proxmox-backup-proxy returns active on host.

  3. EnsureCustomerVms runs → assert via pct list that pmg@lib exists, via qm list that mail@lib and dns@lib exist.

  4. SSH into mail VM (DiscoverVmIP three-tier fallback):

    • systemctl is-active postfix dovecot nginx → all active.

    • ss -tlnp → 25, 26, 143, 587, 993, 443 listening on expected binaries.

    • doveadm user info@test01.iam.free → probe IMAP via dovecot itself.

  5. SSH into PMG LXC via pct exec:

    • systemctl is-active pmg-smtp-filter → active.

    • Port 25 listening.

  6. SSH into DNS VM:

    • systemctl is-active coredns step-ca lego ddns-updater → all active.

    • dig @localhost mail.test01.iam.free → resolves.

  7. https probe (raw HttpClient from test harness):

LE rate limits:

  • lego configured to LE staging in test runs. LE staging certs are not publicly trusted. Test harness either accepts staging chain or skips cert validation for the staging hostname.

Manual (info@osisa.com)

User-driven, not automated:

  1. After test run leaves the stack live, user sends mail from info@osisa.com to someuser@test01.iam.free.

  2. Open https://mail.test01.iam.free/, log in as someuser, verify the message arrived in the inbox.

  3. Reply from roundcube, verify delivery back to info@osisa.com (relay path if configured, else direct MX — see DKIM caveat under risks).

Risks and known limitations

  • PMG inside LXC requires nesting and may need AppArmor profile loosened. pct flags: --features nesting=1 --unprivileged 1. If PMG components fail under default AppArmor, switch to a custom profile (or as a last resort --unprivileged 0). Document the chosen path in a Serena memory once the first run lands.

  • DDNS lag: customer-edge dynamic IP changes are not instant. DNS TTL
    MX TTL pinned at 60 s. There is still a window of inbound mail loss during IP churn; sender retries cover most of it.

  • LE rate limits: test re-issues certs every run. Tests use LE staging endpoint to avoid the production rate limit (50 certs/registered domain/week). Staging certs are not publicly trusted; test harness handles this.

  • DKIM not enforced v1: keypair generated and printed for manual DNS publication. If MailRelayOptions is null AND the customer has not published the DKIM record, direct-MX outbound mail will fail SPF/DKIM alignment at common receivers (Gmail, M365). The expected production pattern is "set MailRelayOptions to a smarthost that signs". Document this as a known limitation; revisit in a v2 iteration that wires rspamd/opendkim signing on the mail VM itself.

  • Edge router NAT: Liberator deployments need 25 / 80 / 443 forwarded from the customer’s edge router to the Liberator. NETSetup does not configure customer-edge routers. Deployment prerequisite, captured in appendix.

  • coredns may not cover everything we need long-term (no DNSSEC signing, limited views). Switch to unbound + nsd if it falls short. Not implemented; recorded here so the next iteration knows where to look.

  • pmg@<hostSerial> is a new "kind" of hosted device (LXC). Routing in EnsureCustomerVms must distinguish LXC from VM, e.g. by Description ("PMG") or a new flag on the device builder. Decision goes to backend agent at implementation time; flag in TestCompany.cs comment so it is visible.

  • DNS provider lock-in is shallow (lego and ddns-updater both abstract via per-provider plugins, and we picked our supported list from the intersection), but credential format varies per provider. Customer config must match the chosen provider’s required env vars / fields; a Cloudflare-shaped credential dict will not work against Porkbun. Provider swaps require updating Customer.DnsProvider.Credentials to the new provider’s keys — not just changing Provider.

Open questions

  • LXC distinction: Description string "PMG" or a new explicit IsLxc/AsLxc() builder method on the device builder? Backend agent to pick at implementation; Description-string is the lower-friction option but conflates routing with display.

  • Where does Customer.Domain live — on Company or on Customer? Brief said "Customer", but the existing TestCompany.cs builds a Company (LocalNet). Likely a new WithDomain("test01.iam.free") on Company, surfaced as Customer.Domain if the entity model already projects. Architect to confirm before backend writes the POCO.

  • Cert-distribution transport from DNS VM to mail VM and PMG LXC: SSH-push on systemd path unit OR pull on systemd timer? Push has lower latency on renewal; pull has simpler firewall story (DNS VM never initiates). Default to SSH-push unless backend hits a blocker.

Appendix: deployment prerequisites

  • Edge router (customer-side) port-forward to Liberator LAN IP:

    • TCP 25 → Liberator (PMG inbound)

    • TCP 80 → Liberator (HTTP-01 fallback / redirect)

    • TCP 443 → Liberator (webmail + PMG admin)

    • TCP 587 → Liberator (submission, optional; many ISPs block 25 outbound)

  • DNS provider account holding <domain> as a zone (provider chosen per customer via Customer.DnsProvider.Provider; Cloudflare shown as example):

    • API credentials scoped to "edit DNS records for that zone" (provider- specific: Cloudflare token scoped to Zone:DNS:Edit; DuckDNS token; Namecheap username + apiKey; Porkbun apiKey + secretKey; etc.).

    • Secret credential values stored as SOPS secrets consumed by ddns-updater + lego; non-secret values inlined in Customer.DnsProvider.Credentials.

  • Registrar NS delegation: zone <domain> delegated to the chosen DNS provider’s NS. ddns-updater + lego both depend on that provider being authoritative for the public view.

  • Reverse DNS (PTR) for the Liberator’s public IP → mail.<domain>. ISP-side or BYOIP. Without PTR, Gmail/M365 will spam-bin direct-MX outbound. Captured here so deployment ops sets it.

  • Optional but recommended: SPF TXT on <domain>:

    • Direct-MX path: v=spf1 a:mail.<domain> -all

    • Smarthost path: whatever the smarthost vendor publishes (e.g. v=spf1 include:mailgun.org -all).