How to Extract Data from Any Website Using an API
Whether you need clean Markdown for an LLM, structured JSON from a product page, or a screenshot of a site — this guide shows how to get it with a single API call.
What you can extract
xtrct supports 9 output formats from any URL:
You can request multiple formats in a single call and get them all back at once.
Getting started
Get a free API key at xtrct.io — no credit card needed, 500 free credits included.
Basic extraction
Get Markdown from any page
Perfect for feeding into LLMs or storing readable content.
curl -X POST https://api.xtrct.io/v1/scrape \
-H "X-API-Key: YOUR_KEY" \
-H "Content-Type: application/json" \
-d '{
"url": "https://example.com/article",
"output": "markdown",
"wait": true
}'
Get clean text
Strips all HTML — just the human-readable text content.
{
"url": "https://example.com/article",
"output": "text",
"wait": true
}
Take a screenshot
{
"url": "https://example.com",
"output": "screenshot",
"screenshot_options": {
"format": "png",
"full_page": true
},
"wait": true
}
Structured data extraction
Use CSS selectors to extract specific fields from a page — great for product data, job listings, news articles, and anything with a consistent structure.
{
"url": "https://shop.example.com/product/123",
"output": "structured",
"selectors": {
"title": "h1.product-title",
"price": ".price-now",
"description": ".product-description",
"rating": "[data-rating]"
},
"wait": true
}
Response:
{
"result": {
"structured": {
"title": "Blue Widget Pro",
"price": "£49.99",
"description": "The best widget for all your widget needs.",
"rating": "4.7"
}
}
}
Multiple formats at once
Get everything you need in one request:
{
"url": "https://example.com/article",
"output": ["markdown", "metadata", "links"],
"wait": true
}
Handling JavaScript-heavy sites
For sites that load content dynamically (React, Vue, etc.), use wait_for
to wait for specific content before extracting:
{
"url": "https://spa-site.com/data",
"output": "markdown",
"wait_for": { "type": "selector", "value": ".data-loaded" },
"wait": true
}
Or wait for network activity to settle:
{
"url": "https://spa-site.com/data",
"output": "html",
"wait_for": { "type": "networkidle" },
"wait": true
}
Using presets for common sites
xtrct ships with 20+ zero-config presets for popular sites. They automatically set the right strategy, wait conditions, and selectors.
{
"url": "https://amazon.com/dp/B0EXAMPLE",
"preset": "amazon",
"wait": true
}
Or let xtrct auto-detect the right preset from the URL:
{
"url": "https://amazon.com/dp/B0EXAMPLE",
"output": "structured",
"wait": true
}
Async jobs for large batches
For bulk scraping, drop the "wait": true flag to get a job ID back immediately,
then poll for the result or use a webhook:
# Submit job
POST /v1/scrape
{ "url": "...", "output": "markdown", "webhook_url": "https://your-site.com/hook" }
# Response immediately:
{ "job_id": "abc123", "status": "queued" }
# Poll result:
GET /v1/jobs/abc123
Python example
import requests
API_KEY = "your_key_here"
BASE = "https://api.xtrct.io/v1"
def scrape(url, output="markdown"):
r = requests.post(
f"{BASE}/scrape",
headers={"X-API-Key": API_KEY},
json={"url": url, "output": output, "wait": True}
)
r.raise_for_status()
return r.json()["result"][output]
content = scrape("https://news.ycombinator.com", "markdown")
print(content[:500])
Ready to start extracting?
500 free credits. No credit card. Works on any website.
Get your free API key →