How to Scrape Cloudflare-Protected Sites in 2026
Cloudflare blocks more scrapers than any other technology on the web. Here's exactly why it works, what techniques fail, and the approaches that still get clean data.
Why Cloudflare is so effective at blocking scrapers
Cloudflare's bot protection works on several layers simultaneously, which is why simple approaches fail:
- JavaScript challenges — Cloudflare serves a JS page that computes a proof-of-work token before redirecting to the actual site. A plain HTTP fetch gets the challenge page, not the content.
- Browser fingerprinting — Even with a headless browser, Cloudflare checks for automation signals: missing WebGL, fake navigator properties, impossible screen dimensions, missing plugins.
- IP reputation — Data-center IPs are pre-flagged. Residential IPs are trusted by default.
- Behavioural analysis — Mouse movement, scroll events, time-on-page. A page that gets scraped in 100ms with no interaction is suspicious.
Approaches that don't work in 2026
Plain HTTP requests
requests.get(), curl, fetch() — these get a 403 or a challenge page immediately on any Cloudflare-protected domain. The HTML you get back is a Cloudflare interstitial, not the site content.
Vanilla Selenium / Playwright with default settings
Running playwright.chromium.launch() with default settings is trivially detectable.
Cloudflare's JS checks navigator.webdriver, looks for specific Chrome DevTools Protocol artifacts,
and checks dozens of other automation signals. You'll get blocked within a few requests.
Rotating data-center proxies alone
IP rotation helps with rate limits but Cloudflare knows AWS, GCP, DigitalOcean, and all major data-center CIDR ranges. Even with fresh IPs you'll still be fingerprinted and blocked.
What actually works
1. Stealth-patched headless Chromium
Libraries like playwright-extra with the stealth plugin patch the automation fingerprints:
spoofing navigator.webdriver, adding realistic plugin lists, patching WebGL renderer strings,
randomising viewport and User-Agent. This gets past most Cloudflare configurations.
# Python example with playwright-extra
from playwright.sync_api import sync_playwright
from playwright_stealth import stealth_sync
with sync_playwright() as p:
browser = p.chromium.launch(headless=True)
page = browser.new_page()
stealth_sync(page)
page.goto("https://protected-site.com")
print(page.content())
This works but requires you to manage browser instances, keep up with Cloudflare updates, and run your own infrastructure.
2. Residential proxy + stealth browser
Combining a residential IP (Bright Data, ProxyJet, Smartproxy) with a stealth-patched browser is the most reliable combination for heavily-protected sites. Residential IPs have genuine ISP reputation, so the IP reputation check passes even for aggressive Cloudflare configs.
3. FlareSolverr
FlareSolverr is an open-source proxy server that solves Cloudflare challenges using a real browser session and returns the result. You self-host it and it handles the JS challenge automatically.
curl -X POST http://localhost:8191/v1 \
-H "Content-Type: application/json" \
-d '{"cmd":"request.get","url":"https://protected-site.com","maxTimeout":60000}'
4. Use a scraping API that handles it for you
If you don't want to manage browser infrastructure, proxy pools, and keep up with Cloudflare updates yourself, a scraping API handles the entire escalation stack automatically.
xtrct runs a 7-step auto-escalation chain: plain HTTP → HTTP with datacenter proxy → HTTP with residential proxy → stealth Playwright → Playwright with datacenter proxy → Playwright with residential proxy → FlareSolverr. It tries the cheapest method first and only escalates when needed.
curl -X POST https://api.xtrct.io/v1/scrape \
-H "X-API-Key: YOUR_KEY" \
-H "Content-Type: application/json" \
-d '{"url":"https://cloudflare-protected-site.com","output":"markdown","wait":true}'
You get back clean Markdown (or HTML, JSON, screenshot, PDF — your choice) without managing any infrastructure.
Choosing the right approach
| Approach | Works on CF? | Setup effort | Cost |
|---|---|---|---|
| Plain HTTP | No | None | Free |
| Stealth Playwright | Usually | High | Server + proxies |
| Residential proxy + stealth | Yes | High | Server + proxies |
| FlareSolverr | Yes | Medium | Self-host |
| xtrct API | Yes | None | Pay-per-use |
Summary
Cloudflare protection in 2026 requires at minimum a stealth-patched browser. For heavily protected sites, combine that with a residential proxy. If you'd rather not manage infrastructure, a scraping API like xtrct handles the entire stack automatically — free to start.
Skip the infrastructure — use xtrct
500 free credits. No credit card required. Cloudflare bypass included.
Get your free API key →