$ pwd

[$ ] use-case: scraping

// NAME

scraping — web scraping & data extraction servers.

// SYNOPSIS

xmrhost-cli playbook describe --workload=scraping
xmrhost-cli provision --workload=scraping --region=<is|ro>

// TL;DR

$ head -n1 README

// stable-asn vps for ethical crawling: clean ip reputation, generous egress, no per-target rate limits.

// DESCRIPTION

$ man playbook(scraping)

// ASN reputation + clean egress > exotic proxy stacks

Most scraping-blocked-ness is not residential-vs-datacenter — it is ASN reputation. A vps in a clean Romanian datacenter ASN routinely outperforms a stale residential proxy because the target site's bot-detection layer has not yet poisoned the /24 the VPS lives in. Cloudflare's bot-management scoring, Datadome, and PerimeterX all weight ASN reputation heavily; a brand-new IP in a low-noise ASN starts with a clean slate.

The technical posture for ethical scraping (robots.txt honored, rate limits respected, no PII without consent) does not need exotic infrastructure: vps-4 with Playwright + Chromium preinstalled handles 90% of the workload. CAPTCHA solvers (2Captcha, CapMonster) integrate at the application layer over the network we offer — there is no CAPTCHA-defeat magic at the host level. Bandwidth is the constraint that bites at scale: scraping pipelines routinely hit 10–30 TB egress / month, which is why the catalog defaults to generous monthly allowances and a no-overage policy on legitimate growth.

Where the scraping is targeting EU sites with personal data, GDPR Articles 6 and 14 apply regardless of where the scraper runs — a Romanian VPS does not absolve the operator of a lawful-basis analysis. We do not host scraping operations targeting personally-identifiable data without documented consent, and we cooperate with substantiated CFAA-style complaints where they apply (most scraping operations never see one).

// see also

Cloudflare — How Bot Management Works (developers.cloudflare.com/bots)
GDPR Articles 6 and 14 (Regulation 2016/679)
robots.txt — RFC 9309 (IETF, 2022)
hiQ v. LinkedIn — 9th Cir. 2019 (CFAA scraping precedent)

// RECOMMENDED NODES

$ xmrhost-cli list --workload=scraping

// 8 plans flagged for this workload. all xmr-billed.

slug type spec $/mo notes

vps-1 vps 1c 2GBDDR4ECC $8 Entry-level KVM VPS, anonymous & DMCA-resistant. vps-2 vps 2c 4GBDDR4ECC $16 Workhorse offshore VPS for small projects. vps-4 vps 4c 8GBDDR4ECC $32 Mid-tier offshore VPS — sweet spot for production. vps-8 vps 8c 16GBDDR4ECC $64 Heavy-duty offshore VPS for traffic-heavy projects. vps-16 vps 16c 32GBDDR4ECC $128 Top-tier offshore VPS — almost a dedicated server. ds-lite dedicated 6c 32GBDDR4ECC $89 Entry-level offshore dedicated server. ds-mid dedicated 16c 64GBDDR4ECC $149 Mid-tier offshore dedicated — Ryzen 9 power. lokinet-1 lokinet exit 2c 4GBDDR4 $27 Oxen-network exit node with the staking-required wallet integration.

// RECOMMENDED REGIONS

$ xmrhost-cli regions list --workload=scraping

is — iceland : RIPE-pooled IP space with low fingerprint-poisoning rate against EU targets; useful as the secondary egress when the primary Romanian ASN is hot.
ro — romania : Cleanest ASN reputation in the catalog for European target sites — diverse non-residential ASNs, low fingerprint score on Cloudflare bot-management. Generous monthly egress allowance.

// THREAT MODEL + AUP BOUNDARY

$ xmrhost-cli scope --workload=scraping

// the hosting layer is one component of the threat model. what we cover, and what we explicitly don't:

// scope: in

Clean ASN reputation against Cloudflare / Datadome / PerimeterX bot-detection scoring
Generous monthly egress allowance with no-overage policy on legitimate growth
Playwright + Chromium + Puppeteer preinstalled, version-pinned
Free IP swap once per month on vps-4 and above for IPs that get burned

// scope: out

robots.txt compliance — that is the operator-customer's lawful-basis concern
GDPR Article 6 / 14 analysis when the scrape touches PII (we are not your DPO)
CAPTCHA solving — 2Captcha / CapMonster integrate at the application layer
Target-site ToS interpretation — read the precedent yourself (hiQ v. LinkedIn is the start)

// AUP boundary

Customers are responsible for compliance with target sites' Terms of Service, robots.txt directives, and applicable computer-misuse and data-protection law (CFAA, GDPR for personal-data scraping, country-specific equivalents). We do not host scraping operations targeting personally identifiable data without consent.

// SEE ALSO

// playbook — full workload list, node — full catalog, location — region posture, why-monero — billing rationale.