Web Scraping Cloudflare March 2026 · 8 min read

How to Scrape Cloudflare-Protected Sites in 2026

Cloudflare blocks more scrapers than any other technology on the web. Here's exactly why it works, what techniques fail, and the approaches that still get clean data.

Why Cloudflare is so effective at blocking scrapers

Cloudflare's bot protection works on several layers simultaneously, which is why simple approaches fail:

JavaScript challenges — Cloudflare serves a JS page that computes a proof-of-work token before redirecting to the actual site. A plain HTTP fetch gets the challenge page, not the content.
Browser fingerprinting — Even with a headless browser, Cloudflare checks for automation signals: missing WebGL, fake navigator properties, impossible screen dimensions, missing plugins.
IP reputation — Data-center IPs are pre-flagged. Residential IPs are trusted by default.
Behavioural analysis — Mouse movement, scroll events, time-on-page. A page that gets scraped in 100ms with no interaction is suspicious.

Approaches that don't work in 2026

Plain HTTP requests

requests.get(), curl, fetch() — these get a 403 or a challenge page immediately on any Cloudflare-protected domain. The HTML you get back is a Cloudflare interstitial, not the site content.

Vanilla Selenium / Playwright with default settings

Running playwright.chromium.launch() with default settings is trivially detectable. Cloudflare's JS checks navigator.webdriver, looks for specific Chrome DevTools Protocol artifacts, and checks dozens of other automation signals. You'll get blocked within a few requests.

Rotating data-center proxies alone

IP rotation helps with rate limits but Cloudflare knows AWS, GCP, DigitalOcean, and all major data-center CIDR ranges. Even with fresh IPs you'll still be fingerprinted and blocked.

What actually works

1. Stealth-patched headless Chromium

Libraries like playwright-extra with the stealth plugin patch the automation fingerprints: spoofing navigator.webdriver, adding realistic plugin lists, patching WebGL renderer strings, randomising viewport and User-Agent. This gets past most Cloudflare configurations.

# Python example with playwright-extra
from playwright.sync_api import sync_playwright
from playwright_stealth import stealth_sync

with sync_playwright() as p:
    browser = p.chromium.launch(headless=True)
    page = browser.new_page()
    stealth_sync(page)
    page.goto("https://protected-site.com")
    print(page.content())

This works but requires you to manage browser instances, keep up with Cloudflare updates, and run your own infrastructure.

2. Residential proxy + stealth browser

Combining a residential IP (Bright Data, ProxyJet, Smartproxy) with a stealth-patched browser is the most reliable combination for heavily-protected sites. Residential IPs have genuine ISP reputation, so the IP reputation check passes even for aggressive Cloudflare configs.

3. FlareSolverr

FlareSolverr is an open-source proxy server that solves Cloudflare challenges using a real browser session and returns the result. You self-host it and it handles the JS challenge automatically.

curl -X POST http://localhost:8191/v1 \
  -H "Content-Type: application/json" \
  -d '{"cmd":"request.get","url":"https://protected-site.com","maxTimeout":60000}'

4. Use a scraping API that handles it for you

If you don't want to manage browser infrastructure, proxy pools, and keep up with Cloudflare updates yourself, a scraping API handles the entire escalation stack automatically.

xtrct runs a 7-step auto-escalation chain: plain HTTP → HTTP with datacenter proxy → HTTP with residential proxy → stealth Playwright → Playwright with datacenter proxy → Playwright with residential proxy → FlareSolverr. It tries the cheapest method first and only escalates when needed.

curl -X POST https://api.xtrct.io/v1/scrape \
  -H "X-API-Key: YOUR_KEY" \
  -H "Content-Type: application/json" \
  -d '{"url":"https://cloudflare-protected-site.com","output":"markdown","wait":true}'

You get back clean Markdown (or HTML, JSON, screenshot, PDF — your choice) without managing any infrastructure.

Choosing the right approach

Approach	Works on CF?	Setup effort	Cost
Plain HTTP	No	None	Free
Stealth Playwright	Usually	High	Server + proxies
Residential proxy + stealth	Yes	High	Server + proxies
FlareSolverr	Yes	Medium	Self-host
xtrct API	Yes	None	Pay-per-use

Summary

Cloudflare protection in 2026 requires at minimum a stealth-patched browser. For heavily protected sites, combine that with a residential proxy. If you'd rather not manage infrastructure, a scraping API like xtrct handles the entire stack automatically — free to start.

Skip the infrastructure — use xtrct

500 free credits. No credit card required. Cloudflare bypass included.

Get your free API key →