[PART-18] MailStack Enterprise v3

Murali Krishna · Post by **Murali Krishna** » Mon Jun 15, 2026 7:50 am

NETAPORT MailStack Enterprise v3 — 00-run-all.sh (orchestrator)

Runs all 17 install scripts in order, stops on the first failure with an exact resume instruction, and — because every script is idempotent — re-running simply fast-skips everything that already passed. Handles the two special cases in the build: Dovecot with POP3 enabled, and the AIDE baseline refresh that must follow the hardening step.

What this script is for

This is the conductor for the whole platform. Instead of running 01 through 17 by hand and remembering which ones need special arguments, you run this one script and it drives the entire deployment in the correct order. Its key property is being resume-aware: each installer tracks its own completed stages with state markers, so if something fails at, say, script 11, you fix the problem and just run the orchestrator again — scripts 01-10 skip in seconds and it picks up where it stopped. It also bakes in the two build-specific quirks so you can't forget them: Dovecot is launched with POP3 on, and right after hardening it refreshes the integrity baseline so script 16's own changes don't trip the nightly AIDE check.

How to run

Code: Select all

./00-run-all.sh                 # run from the beginning (resume-aware)
./00-run-all.sh --from 11       # force-start at a specific script number
./00-run-all.sh --list          # show the run order and exit

How it works

Ordered run list
Holds the canonical 01→17 sequence in one array. The run order lives in exactly one place, so it's easy to audit and adjust.

Per-script invocation
Most scripts run plainly; the special cases are centralised and explicit. 04-dovecot.sh is invoked with ENABLE_POP3=1 to match this build. The orchestrator deliberately does not pass any --force — it relies on each script's own state markers, so to truly redo a stage you clear that marker (as each script's docs describe) and re-run.

Stop-on-failure
set -Eeuo pipefail plus an explicit check means any non-zero exit halts the chain immediately. On failure it prints which script failed, its exit code, the two ways to resume, and exactly where to look (the install log and that script's report).

Post-hardening AIDE refresh
After 16-hardening.sh succeeds, it runs aide --update and rolls the new database into place (and rkhunter --propupd), so the post-hardening state becomes the new known-good baseline. Non-fatal if the new DB isn't produced — it's logged for review.

Resume state
Records the last successfully completed script to /var/lib/netaport/state/00-run-all.last-completed.

Post-deploy reminders (printed on success)

Publish DKIM/SPF/DMARC DNS + PTR before sending production mail (DKIM record at /root/netaport-reports/dkim-dns-records.txt)
Once the FQDN's DNS is live, re-run 08-roundcube.sh to obtain a real Let's Encrypt cert (Certbot auto-engages, stapling switches on)
After deploying SSH keys, re-run 16-hardening.sh with KEYS_ONLY=1
Store the restic repo password (from 17) OFF this host — it is unrecoverable if lost

Engineering notes

Resume-aware via per-script state markers — no wasted work on re-run
Stop-on-first-failure with an exact, actionable resume message
Special cases (POP3, post-16 baseline refresh) centralised and auditable
--list and --from N for inspection and targeted restarts
Must run as root; logs to /var/log/netaport-install.log
set -Eeuo pipefail throughout

The full script

Code: Select all


#!/usr/bin/env bash
#==============================================================================
# NETAPORT MailStack Enterprise v3 — 00-run-all orchestrator
#
# Runs the 17 install scripts in order. STOPS on the first script that fails,
# printing exactly which one and how to resume. Because each script is
# idempotent (it skips stages already marked done), simply re-running this
# orchestrator after you fix the failing script resumes from that point:
# everything that already passed is fast-skipped.
#
# Usage:
#   ./00-run-all.sh                 # run from the beginning (resume-aware)
#   ./00-run-all.sh --from 11       # force-start at a specific script number
#   ./00-run-all.sh --list          # show the run order and exit
#
# Notes:
#   * Script 04 (Dovecot) is invoked with ENABLE_POP3=1 to match this build.
#   * After script 16 (hardening), the AIDE baseline is refreshed inline so the
#     nightly integrity check does not flag 16's own changes as violations.
#   * This orchestrator does NOT pass --force; it relies on each script's own
#     state markers. To truly redo a stage, clear its marker as the script docs
#     describe, then re-run this orchestrator.
#==============================================================================
set -Eeuo pipefail

#------------------------------ CONFIG ----------------------------------------
SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
LOG_FILE="/var/log/netaport-install.log"
RUNNER_STATE="/var/lib/netaport/state/00-run-all.last-completed"

# Deployment identity. Script 01 needs a real FQDN + primary domain; it writes
# them into /etc/netaport/netaport.env, which every later script sources, so
# these only need to be correct for the very first run of 01. Override on the
# command line if your domain differs, e.g.:
#   MAIL_FQDN=mail.yourdomain.com PRIMARY_DOMAIN=yourdomain.com ./00-run-all.sh
#   ACME_EMAIL=postmaster@yourdomain.com ./00-run-all.sh
export MAIL_FQDN="${MAIL_FQDN:-mail.example.com}"
export PRIMARY_DOMAIN="${PRIMARY_DOMAIN:-example.com}"
export ACME_EMAIL="${ACME_EMAIL:-postmaster@${PRIMARY_DOMAIN}}"

# The ordered list of installer scripts. Edit here if filenames differ.
SCRIPTS=(
  "01-prerequisites.sh"
  "02-mariadb.sh"
  "03-postfix.sh"
  "04-dovecot.sh"
  "05-redis.sh"
  "06-rspamd.sh"
  "07-clamav.sh"
  "08-roundcube.sh"
  "09-prometheus.sh"
  "10-grafana.sh"
  "11-exporters.sh"
  "12-alertmanager.sh"
  "13-loki-promtail.sh"
  "14-crowdsec.sh"
  "15-aide-rkhunter.sh"
  "16-hardening.sh"
  "17-backup-validation.sh"
)

#------------------------------ LOGGING ---------------------------------------
ts()   { date '+%Y-%m-%d %H:%M:%S'; }
log()  { printf '%s [%-5s] [run-all] %s\n' "$(ts)" "$1" "$2" | tee -a "${LOG_FILE}"; }
info() { log INFO  "$*"; }
warn() { log WARN  "$*"; }
err()  { log ERROR "$*"; }

#------------------------------ HELPERS ---------------------------------------
require_root() {
  [[ "${EUID}" -eq 0 ]] || { echo "This orchestrator must run as root."; exit 1; }
}

# Per-script invocation: most are plain, a couple need env/args. Centralised so
# the run loop stays simple and the special cases are explicit and auditable.
run_one() {
  local script="$1" path="${SCRIPT_DIR}/$1"

  [[ -f "${path}" ]] || { err "Missing script: ${path}"; return 127; }
  chmod +x "${path}" 2>/dev/null || true

  case "${script}" in
    04-dovecot.sh)
      info "Running ${script} (ENABLE_POP3=1) ..."
      ENABLE_POP3=1 "${path}"
      ;;
    *)
      info "Running ${script} ..."
      "${path}"
      ;;
  esac
}

# The AIDE baseline refresh that must follow script 16. Runs only if aide is
# present; non-fatal if the new DB was not produced (logged for review).
refresh_aide_baseline() {
  command -v aide >/dev/null 2>&1 || { warn "aide not found; skipping baseline refresh."; return 0; }
  info "Refreshing AIDE baseline after hardening (16) ..."
  aide --update >>"${LOG_FILE}" 2>&1 || warn "aide --update returned non-zero (often normal: it reports diffs)."
  if [[ -f /var/lib/aide/aide.db.new.gz ]]; then
    mv -f /var/lib/aide/aide.db.new.gz /var/lib/aide/aide.db.gz
    info "AIDE baseline refreshed (now reflects post-hardening state)."
  else
    warn "aide.db.new.gz not found; baseline NOT refreshed — run 'aide --init' manually if needed."
  fi
  if command -v rkhunter >/dev/null 2>&1; then
    rkhunter --propupd >>"${LOG_FILE}" 2>&1 || warn "rkhunter --propupd returned non-zero."
    info "rkhunter file-property baseline updated."
  fi
}

#------------------------------ ARG PARSING -----------------------------------
START_INDEX=0
case "${1:-}" in
  --list)
    echo "NETAPORT run order:"
    i=1; for s in "${SCRIPTS[@]}"; do printf '  %2d. %s\n' "${i}" "${s}"; i=$((i+1)); done
    exit 0
    ;;
  --from)
    [[ -n "${2:-}" ]] || { echo "--from requires a script number (1-${#SCRIPTS[@]})."; exit 1; }
    if ! [[ "${2}" =~ ^[0-9]+$ ]] || (( 2 < 1 )) || (( ${2} > ${#SCRIPTS[@]} )); then
      echo "Invalid --from value '${2}'. Use 1-${#SCRIPTS[@]} (see --list)."; exit 1
    fi
    START_INDEX=$(( ${2} - 1 ))
    ;;
  "" ) : ;;  # default: run from start, resume-aware via per-script markers
  * )
    echo "Unknown option '${1}'. Use: (no args) | --from N | --list"; exit 1
    ;;
esac

#------------------------------ MAIN ------------------------------------------
require_root
mkdir -p "$(dirname "${RUNNER_STATE}")"

info "================================================================"
info "NETAPORT MailStack v3 — orchestrated run of ${#SCRIPTS[@]} scripts"
info "Identity: FQDN=${MAIL_FQDN} | domain=${PRIMARY_DOMAIN} | acme=${ACME_EMAIL}"
info "Mode: stop-on-failure, resume-aware. Start at #$((START_INDEX + 1))."
info "================================================================"

total="${#SCRIPTS[@]}"
for (( idx = START_INDEX; idx < total; idx++ )); do
  num=$(( idx + 1 ))
  script="${SCRIPTS[idx]}"

  info "---------- [${num}/${total}] ${script} ----------"

  # Run the script. set -e plus this explicit check means any non-zero exit
  # halts the chain immediately with a clear, actionable message.
  if run_one "${script}"; then
    echo "${num}:${script}" > "${RUNNER_STATE}"
    info "[${num}/${total}] ${script} OK."
  else
    rc=$?
    err "================================================================"
    err "STOPPED: [${num}/${total}] ${script} failed (exit ${rc})."
    err "Fix the issue, then resume with EITHER:"
    err "    ./00-run-all.sh                # re-run; passed stages fast-skip"
    err "    ./00-run-all.sh --from ${num}        # jump straight to this script"
    err "Logs: ${LOG_FILE}  and  /root/netaport-reports/${script%.sh}-report.txt"
    err "================================================================"
    exit "${rc}"
  fi

  # Special post-step: refresh the integrity baseline right after hardening.
  if [[ "${script}" == "16-hardening.sh" ]]; then
    refresh_aide_baseline
  fi
done

info "================================================================"
info "ALL ${total} scripts completed successfully."
info "NETAPORT MailStack Enterprise v3 deployment is complete."
info "================================================================"
info "Post-deploy reminders:"
info "  * Publish DKIM/SPF/DMARC DNS + PTR before sending production mail"
info "    (DKIM record: /root/netaport-reports/dkim-dns-records.txt)"
info "  * When DNS for the FQDN is live, re-run 08-roundcube.sh to obtain a"
info "    real Let's Encrypt cert (Certbot auto-engages, stapling switches on)."
info "  * After deploying SSH keys, re-run 16-hardening.sh with KEYS_ONLY=1."
info "  * Store the restic repo password (from 17) OFF this host — it is"
info "    unrecoverable if lost."

This orchestrates the full series: 01-prerequisites.sh → 17-backup-validation.sh

Code: Select all

chmod +x *.sh
MAIL_FQDN=mail.example.com PRIMARY_DOMAIN=example.com ./00-run-all.sh