Does robots.txt prevent pages from appearing in Google?

robots.txt blocks crawling, not indexing. If another site links to a blocked URL, Google can still index it without crawling it (showing the URL with no snippet). To prevent indexing, use noindex meta tag instead.

Robots.txt Generator Guide: Rules, Syntax, and Common Mistakes

Robots.txt Syntax

User-agent: specifies which crawler the rules apply to (* means all). Disallow: blocks the specified path. Allow: explicitly permits a path (overrides a broader Disallow). Sitemap: references your sitemap URL. Each User-agent block can have multiple Disallow/Allow lines. Rules are case-sensitive.

What to Block and What Not to Block

Block: /admin/, /login/, /cart/, /search?*, /checkout/, /private/, staging subdirectories, infinite pagination parameters (?page=). Never block: CSS files, JavaScript files, images (blocking these prevents Google from rendering your pages). /api/ — block only if not needed for SEO.

Common Dangerous Mistakes

Disallow: / — blocks everything. Don't accidentally add this. Blocking the sitemap URL in robots.txt. Using robots.txt to prevent sensitive data indexing (use authentication or noindex instead — robots.txt is publicly readable). Capitalization errors (User-Agent vs. User-agent — the standard uses User-agent).

Robots.txt Generator Guide: Rules, Syntax, and Common Mistakes

Robots.txt Syntax

What to Block and What Not to Block

Common Dangerous Mistakes

Frequently Asked Questions

Generate your robots.txt

Related Posts