Documentation

Documentation → features

Web Scraping

CorsProxy includes a built‑in content extraction feature so you can pull structured data from pages directly in the browser.

Production plan required: extract requires a Production plan and a valid API key.

Extract content from HTML

Enable extraction by adding extract=1:

https://corsproxy.io/?url=https://example.com&extract=1

Parameters

Parameter	Description	Example
`extract`	Enable extraction (`1`)	`extract=1`
`selector`	CSS selector for main content	`selector=article`
`titleSelector`	CSS selector for title	`titleSelector=h1`
`bylineSelector`	CSS selector for author/byline	`bylineSelector=.byline`
`strip`	Comma‑separated selectors to remove	`strip=.ads,.promo`
`format`	`json` (default) or `text`	`format=text`
`maxChars`	Max characters in output	`maxChars=5000`

Example (structured JSON)

https://corsproxy.io/?url=https://news.ycombinator.com&extract=1&selector=.titleline%20%3E%20a

Example (plain text)

https://corsproxy.io/?url=https://example.com&extract=1&format=text

Response content type is application/json;charset=UTF-8 (or text/plain;charset=UTF-8 when format=text).

CSV/XML/RSS conversion is documented separately in File Conversion.

Related guides

Cookies vs Local Storage: When to Use Each for Web Development

This guide will explore what cookies and local storage are, explain their differences, and provide insights into when to use each.

Avoid These Mistakes When Handling CORS

Cross-Origin Resource Sharing (CORS) is a fundamental security feature implemented in web browsers to control how resources are shared between different origins.

Common CSRF Protection Mistakes Developers Make (And How to Fix Them)

Cross-Site Request Forgery (CSRF) remains one of the most misunderstood web security vulnerabilities. Learn the most common CSRF protection mistakes developers make, how to distinguish CSRF from CORS errors, and implement bulletproof CSRF defenses.

Glossary terms

Datacenter Proxy

High-speed proxy servers hosted in data centers, offering fast connections and low latency for web scraping, automation, and high-volume data collection at affordable prices.

Headless Browser

A web browser without a graphical user interface that can be controlled programmatically, commonly used for automated testing, web scraping, and server-side rendering.

An open-source browser automation framework developed by Microsoft that enables reliable end-to-end testing and web scraping across Chromium, Firefox, and WebKit with a single API.