Robots.txt Generator Guide: Rules, Syntax, and Common Mistakes
Robots.txt Syntax
User-agent: specifies which crawler the rules apply to (* means all). Disallow: blocks the specified path. Allow: explicitly permits a path (overrides a broader Disallow). Sitemap: references your sitemap URL. Each User-agent block can have multiple Disallow/Allow lines. Rules are case-sensitive.
What to Block and What Not to Block
Block: /admin/, /login/, /cart/, /search?*, /checkout/, /private/, staging subdirectories, infinite pagination parameters (?page=). Never block: CSS files, JavaScript files, images (blocking these prevents Google from rendering your pages). /api/ — block only if not needed for SEO.
Common Dangerous Mistakes
Disallow: / — blocks everything. Don't accidentally add this. Blocking the sitemap URL in robots.txt. Using robots.txt to prevent sensitive data indexing (use authentication or noindex instead — robots.txt is publicly readable). Capitalization errors (User-Agent vs. User-agent — the standard uses User-agent).
Frequently Asked Questions
Generate your robots.txt
Create a valid robots.txt file with the free Robots.txt Generator.
Open Robots.txt Generator