Tutorial Web Scraping March 2026 · 7 min read

How to Extract Data from Any Website Using an API

Whether you need clean Markdown for an LLM, structured JSON from a product page, or a screenshot of a site — this guide shows how to get it with a single API call.


What you can extract

xtrct supports 9 output formats from any URL:

html cleaned_html markdown text screenshot pdf links metadata structured

You can request multiple formats in a single call and get them all back at once.

Getting started

Get a free API key at xtrct.io — no credit card needed, 500 free credits included.

Basic extraction

Get Markdown from any page

Perfect for feeding into LLMs or storing readable content.

curl -X POST https://api.xtrct.io/v1/scrape \
  -H "X-API-Key: YOUR_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "url": "https://example.com/article",
    "output": "markdown",
    "wait": true
  }'

Get clean text

Strips all HTML — just the human-readable text content.

{
  "url": "https://example.com/article",
  "output": "text",
  "wait": true
}

Take a screenshot

{
  "url": "https://example.com",
  "output": "screenshot",
  "screenshot_options": {
    "format": "png",
    "full_page": true
  },
  "wait": true
}

Structured data extraction

Use CSS selectors to extract specific fields from a page — great for product data, job listings, news articles, and anything with a consistent structure.

{
  "url": "https://shop.example.com/product/123",
  "output": "structured",
  "selectors": {
    "title":       "h1.product-title",
    "price":       ".price-now",
    "description": ".product-description",
    "rating":      "[data-rating]"
  },
  "wait": true
}

Response:

{
  "result": {
    "structured": {
      "title":       "Blue Widget Pro",
      "price":       "£49.99",
      "description": "The best widget for all your widget needs.",
      "rating":      "4.7"
    }
  }
}

Multiple formats at once

Get everything you need in one request:

{
  "url": "https://example.com/article",
  "output": ["markdown", "metadata", "links"],
  "wait": true
}

Handling JavaScript-heavy sites

For sites that load content dynamically (React, Vue, etc.), use wait_for to wait for specific content before extracting:

{
  "url": "https://spa-site.com/data",
  "output": "markdown",
  "wait_for": { "type": "selector", "value": ".data-loaded" },
  "wait": true
}

Or wait for network activity to settle:

{
  "url": "https://spa-site.com/data",
  "output": "html",
  "wait_for": { "type": "networkidle" },
  "wait": true
}

Using presets for common sites

xtrct ships with 20+ zero-config presets for popular sites. They automatically set the right strategy, wait conditions, and selectors.

{
  "url": "https://amazon.com/dp/B0EXAMPLE",
  "preset": "amazon",
  "wait": true
}

Or let xtrct auto-detect the right preset from the URL:

{
  "url": "https://amazon.com/dp/B0EXAMPLE",
  "output": "structured",
  "wait": true
}

Async jobs for large batches

For bulk scraping, drop the "wait": true flag to get a job ID back immediately, then poll for the result or use a webhook:

# Submit job
POST /v1/scrape
{ "url": "...", "output": "markdown", "webhook_url": "https://your-site.com/hook" }

# Response immediately:
{ "job_id": "abc123", "status": "queued" }

# Poll result:
GET /v1/jobs/abc123

Python example

import requests

API_KEY = "your_key_here"
BASE    = "https://api.xtrct.io/v1"

def scrape(url, output="markdown"):
    r = requests.post(
        f"{BASE}/scrape",
        headers={"X-API-Key": API_KEY},
        json={"url": url, "output": output, "wait": True}
    )
    r.raise_for_status()
    return r.json()["result"][output]

content = scrape("https://news.ycombinator.com", "markdown")
print(content[:500])

Ready to start extracting?

500 free credits. No credit card. Works on any website.

Get your free API key →