[$ xmrhost] _

$ man 7 uptime

[$ ] Uptime — operational reliability, honestly stated

// NAME

uptime — operator reliability posture. SLA target, service-credit ladder, monitoring baseline, incident communication. Documents what is, not what sounds good in marketing copy.

// POLICY

This page does not publish a synthetic "99.99%" badge or a live dashboard with hand-tuned green pixels. Synthetic uptime numbers are below the threshold for honest operator communication: every provider publishes them, none of them mean what they appear to mean (the denominator is usually elastic). The brand publishes what it actually commits to and what an operator can verify independently.

// SLA TARGET

$ cat /etc/xmrhost/sla.txt

// metric // target // measurement window
node availability 99.9% rolling 30-day, per-node
network availability (DC transit) 99.95% rolling 30-day, per-region
control-plane (console, billing) 99.5% rolling 30-day
support response (operational) 24h business / 72h weekend first response, per ticket
scheduled-maintenance notice ≥ 72h advance per-affected-tenant email

// 99.9% rolling-30-day allows ~43 minutes of unavailability per month before the service-credit ladder triggers. 99.95% on network allows ~22 minutes. These are achievable targets on properly-engineered single-region infrastructure; they are not the "five nines" / "six nines" numbers that require multi-region active-active and that this operator does not run.

// SERVICE-CREDIT LADDER

$ man service-credit

// measured availability // service credit // scope
≥ 99.9% 0% (target met) no credit
99.0% — 99.9% 10% of monthly fee credited to next invoice
95.0% — 99.0% 25% of monthly fee credited to next invoice
< 95.0% 50% of monthly fee + cancel-with-refund option customer choice

// credits are issued in XMR (refund mechanics at /legal/refund) on the next billing cycle, or the equivalent USD-denominated bookkeeping if the customer prefers credit-as-discount. The customer initiates the claim — the operator does not auto-credit (the claim is the signal the customer actually experienced the unavailability).

// MONITORING BASELINE

$ man monitoring

The operator-side monitoring is the floor below which an outage becomes visible to the operator within minutes (not the customer having to call). The stack:

  • Synthetic probes from three external jurisdictions (US East, Asia, EU South) check every node's control surface (ICMP + TCP/22 + TCP/443 where applicable) on a 60-second cadence. Two-of-three failure triggers operator alerting.
  • Self-host Uptime Kuma instance running the probes; operator owns the alerting path (no SaaS monitoring vendor in the loop, no third-party data-subpoena surface).
  • Tor hidden-service liveness separately probed via a dedicated client maintaining circuit-build measurements on a 5-minute cadence (hidden services have different failure modes than clearnet endpoints).
  • Per-region transit smokes — BGP-route divergence + AS-level latency + DNS-resolution time tracked from three external vantage points per region.
  • Audit-log replication off the production hosts on an hourly cadence so an outage that affects monitoring does not also blind the post-incident forensic trail.

// HOW AN INCIDENT IS COMMUNICATED

$ man incident-response

The operator's incident communication uses three channels, used consistently across incidents large and small:

  • Email to affected tenants — within 30 minutes of operator confirmation. The mail describes what is observed (not yet what is hypothesised), the scope (which regions / which plans / which addons), and the next scheduled update time. No marketing apologia, no "we apologise for any inconvenience". Operator-speak.
  • Matrix support room — real-time updates as the incident develops. The room URI is on /contact.
  • Post-incident write-up — material incidents (≥ 30 minutes user-visible) get a /notes entry within 7 days. Format: timeline, root cause, what changed on the operator side as a result. No naming-and-shaming of upstream providers; no scapegoating; honest engineering review.

// DEPENDENCIES OUTSIDE OPERATOR CONTROL

$ ls /etc/xmrhost/dependencies

Honest reliability accounting names what the operator does NOT control. The dependencies below influence node availability and are explicitly outside the SLA scope:

  • Upstream provider transit — Iceland: FARICE-1, DANICE, IRIS submarine cables. Romania: NXData, M247, ITS carriers. A multi-hour transit-provider incident is on the provider, not on the operator; service credits still apply per the ladder above.
  • BGP / AS-level routing — peering disputes, BGP hijacks, mass-route-withdraw events. Operator monitors; operator cannot unilaterally resolve.
  • DNS — DNSSEC infrastructure, registrar availability. Operator runs primary + secondary DNS on different ASNs to bound this dependency.
  • OxaPay processor — invoice settlement requires the OxaPay hosted checkout to be reachable. A control-plane outage may delay new orders but does not affect already-provisioned services.
  • Tor network — for /node/tor-* plans, the Tor network's own health (consensus distribution, sufficient guard + middle relays) is the floor below which hidden-service reachability degrades regardless of operator-side state.

// LIVE STATUS PAGE

$ curl status.xmrhost.io

A separate live-status surface (status.xmrhost.io) is in the operator's roadmap and ships once the self-host Uptime Kuma federation across the three monitoring vantage points stabilises. The page will run on infrastructure independent of the brand's production hosts (different region, different upstream) so a full production outage does not also take down the status page — the classic failure mode of co-located status surfaces.

// while that ships, operator- confirmed incidents are communicated via the email + Matrix channels documented above. Tenants subscribed to the support email channel receive every operator-confirmed incident even if their plan is not affected.

// SEE ALSO

$ ls /usr/share/doc/xmrhost/reliability

  • /legal/sla — full SLA contractual text.
  • /legal/refund — service-credit-as-XMR-refund mechanics.
  • /notes — past post-incident write-ups (when material).
  • /contact — support / abuse / legal channels.
  • /threat-models — operational threat models per workload.