Web scraping is automated copying of information from websites — the same thing you'd do by hand, done by a script at scale. The question "is it legal?" comes up constantly, and the honest answer is: it depends. Not on the technology, but on what you collect, where, and how. This is the practical version, not legal advice — when real money or sensitive data is involved, talk to a lawyer.
How scraping works
A script sends a request to a page, downloads the HTML, and extracts the parts you want — text, prices, images, reviews. It doesn't break into anything; it reads public, openly served information. That's the whole mechanism, and it's also where the legal grey area starts: the act of reading a public page is rarely the problem. What you do around it is.
What pushes scraping onto the right side of the line
Scraping is generally accepted when:
- the data is publicly available without logging in;
- nothing in the site's terms of service or
robots.txtexplicitly forbids automated collection; - you're not copying copyrighted, original content for commercial reuse without permission;
- your bot behaves — reasonable request rates, no server-hammering.
Tracking market trends, monitoring price changes, watching public reviews — done within technical and legal limits — sits in a defensible "grey but not violation" zone.
What pushes it onto the wrong side
It can be treated as illegal when:
- you breach terms of service that explicitly ban automated collection;
- you bypass a security measure — a login, a captcha, access controls;
- you scrape personal data (emails, phone numbers, profiles);
- you reuse copyrighted material without a licence;
- the bot is aggressive enough to degrade the target's service.
Bypassing protections and collecting personal data are the two fastest ways to turn a routine job into a real problem.
The jurisdiction matters
There's no single global rule. In the US, it's largely case law — the well-known hiQ Labs v. LinkedIn line held that collecting genuinely public data doesn't breach the Computer Fraud and Abuse Act when no access protection is circumvented, but violating terms of service can still trigger civil claims. In the EU, the GDPR governs anything involving personal data, even public personal data, demanding lawful basis, minimisation and transparency, with heavy fines for getting it wrong. Other regimes — China's Data Security Law, Brazil's LGPD, Canada's PIPEDA — add their own constraints. If you operate across borders, you inherit all of them.
How to keep it clean
A few habits separate sustainable scraping from the kind that ends in a block or a letter:
- Read the terms and
robots.txtbefore you start. Public doesn't automatically mean "yours to take in bulk." - Prefer official APIs where they exist — structured data, explicit limits, far less legal ambiguity.
- Throttle. Limit request frequency; don't try to vacuum a whole site at once.
- Collect only what you need, and don't pass personal data to third parties.
- Anonymise responsibly with proxies. Routing through clean IPs distributes load and keeps you from hammering a target from a single address — which is both good etiquette and good for staying unblocked. A dedicated static IPv4 or ISP address behaves predictably and keeps your reputation separate from shared, already-flagged pools.
The point of proxies here isn't to "get away with" anything — it's to scrape responsibly and stay accessible. Legality comes down to purpose, context and restraint. Done thoughtfully, scraping is a powerful and defensible tool. Done by bypassing protections and grabbing personal data, it isn't — and no proxy changes that.
This article is general information, not legal advice. Rules vary by jurisdiction and case; consult a qualified lawyer for your specific situation.