Crawl Budget: Why Google Doesn't Index All Your Pages
9 min read
Crawl Budget: Why Google Doesn’t Index All Your Pages
Reading time: 9 minutes
Quick Definition: Crawl budget is the number of pages Google will crawl on your site within a given timeframe (usually per day). If your site has 10,000 pages but Google only crawls 100/day, it’ll take 100 days to index everything. assuming you don’t add new pages in the meantime.
Key insight: Small sites (under 1,000 pages) rarely need to worry about crawl budget. Large sites, e-commerce stores, and news sites should optimize it.
TLDR
Crawl budget is how many pages Google will crawl on your site per day. Small sites under 1,000 pages don’t need to worry. Google crawls them fully within days. Large sites waste budget on duplicate content, redirect chains, broken links, and infinite filter combinations. One e-commerce site blocked low-value filter pages and went from 500 products crawled daily to 2,000, cutting new product indexing from months to one week. Optimize by submitting an XML sitemap, blocking low-value pages, and improving server speed.
How Crawl Budget Works
Google’s crawler (Googlebot) has limited resources. It decides:
- How many pages to crawl on your site (crawl rate)
- Which pages to prioritize (crawl demand)
Crawl rate limit:
- Determined by your server’s capacity
- Google won’t crawl so fast it crashes your server
- Higher for sites with fast servers and good hosting
Crawl demand:
- How popular is the page? (traffic, backlinks)
- How frequently does it update?
- Is it already indexed and ranking?
Crawl budget = Rate limit × Demand
Who Needs to Care About Crawl Budget?
You SHOULD optimize if:
- E-commerce site with 10,000+ products
- News site publishing 50+ articles/day
- Site with millions of pages (large directories, databases)
- International site with many language/country variations
- Site with heavy URL parameters (filters, sorts, sessions)
- Site suffering from slow indexing (new pages take weeks to appear)
You probably DON’T need to worry if:
- Blog with under 1,000 pages
- Small business site (5-50 pages)
- Portfolio or brochure site
- New site with limited content
Google’s own guidance: Sites under 1,000 URLs are crawled efficiently without intervention.
What Wastes Crawl Budget
1. Duplicate Content
Problem:
example.com/product/blue-widget
example.com/product/blue-widget?ref=homepage
example.com/product/blue-widget?sort=price
example.com/product/blue-widget?color=blue
Google crawls 4 URLs, but they’re all the same content.
Fix:
- Use canonical tags pointing to
/product/blue-widget - Block parameters in robots.txt:
Disallow: /*? - Set parameter handling in Google Search Console
2. Low-Quality/Thin Pages
Examples:
- Empty category pages
- “No results found” search pages
- Paginated pages with minimal content
- Automatically generated doorway pages
Fix:
- Noindex thin pages
- Consolidate content
- Use robots.txt to block crawling
3. Soft 404s (Fake 404s)
Problem: Pages that don’t exist but return 200 OK instead of 404 Not Found.
Example:
GET /this-page-doesnt-exist
Response: 200 OK
Body: "Sorry, page not found"
Google crawls these thinking they’re real pages, wasting budget.
Fix: Return proper 404 status codes for missing pages.
4. Redirect Chains
Problem:
Page A → 301 → Page B → 301 → Page C → 301 → Page D
Google must crawl 4 URLs to reach the final destination.
Fix: Redirect directly:
Page A → 301 → Page D
Page B → 301 → Page D
Page C → 301 → Page D
5. Infinite Spaces (Faceted Navigation)
Problem: E-commerce filters creating millions of combinations:
/shoes
/shoes?color=red
/shoes?color=red&size=10
/shoes?color=red&size=10&brand=nike
/shoes?color=red&size=10&brand=nike&price=50-100
...
Fix:
- Use
noindexon filtered pages - Implement
rel="canonical"to main category - Block filter parameters in robots.txt
- Use AJAX filters (not changing URL)
6. Broken Links (404s)
Problem: Internal links pointing to non-existent pages.
Why it wastes budget: Google crawls the 404, gets nothing useful, but still counts it against your budget.
Fix:
- Run regular broken link audits (Screaming Frog, Ahrefs)
- Fix internal 404s (update links or redirect)
7. Orphaned Pages
Problem: Pages with zero internal links pointing to them.
Why it matters: If Google can’t find the page through your site navigation, it may never crawl it (unless it has external backlinks).
Fix:
- Add pages to your sitemap
- Link to them from relevant pages
- Check for orphans with crawl tools
How to Optimize Crawl Budget
1. Submit an XML Sitemap
Why it helps: Tells Google exactly which pages exist and how often they change.
How:
- Generate sitemap (most CMS do this automatically)
- Submit via Google Search Console
- Keep it updated (remove deleted pages, add new ones)
Sitemap priorities:
<url>
<loc>https://example.com/important-page</loc>
<priority>1.0</priority>
<changefreq>daily</changefreq>
</url>
Note: Priority and changefreq are hints, not commands. Google may ignore them.
2. Fix Crawl Errors
Check Google Search Console:
- Coverage → Errors
- Look for server errors (500, 503)
- Fix broken redirects
- Resolve DNS issues
Common errors:
Server error (5xx)Redirect errorSubmitted URL not found (404)
3. Improve Site Speed
Why it matters: Faster servers = Google can crawl more pages in the same time.
Optimizations:
- Upgrade hosting (shared → VPS → dedicated)
- Enable gzip compression
- Optimize database queries
- Use a CDN for static assets
- Reduce server response time (aim for <200ms)
Check speed:
- Google Search Console → Settings → Crawl Stats
- Shows avg response time, crawl requests/day
4. Use Robots.txt Strategically
Block low-value pages:
User-agent: *
Disallow: /search?
Disallow: /filter?
Disallow: /cart/
Disallow: /checkout/
Disallow: /admin/
Allow high-value pages:
Allow: /products/
Allow: /blog/
5. Manage URL Parameters
Google Search Console → Settings → URL Parameters:
- Sorts (price-low-high): Tell Google to ignore
- Filters (color=red): Representative URL
- Pagination (page=2): Let Googlebot decide
- Tracking (utm_source): Tell Google to ignore
Example configuration:
Parameter: color
Effect: No URLs
Googlebot: No URLs (parameter doesn't change page content significantly)
6. Update Content Regularly
Why: Google prioritizes crawling pages that change frequently.
Strategy:
- Refresh old blog posts (add new info, update dates)
- Keep product descriptions current
- Remove outdated seasonal content
- Publish new content consistently
Evidence Google is crawling:
- Google Search Console → Settings → Crawl Stats
- Check “Total crawl requests” over time
7. Internal Linking
Why it helps: Google discovers pages by following links. More internal links = easier discovery.
Best practices:
- Link to new pages from high-authority pages (homepage, popular posts)
- Use descriptive anchor text
- Don’t bury important pages 5+ clicks deep
- Create hub pages linking to related content
8. Monitor and Adjust Crawl Rate
Google Search Console → Settings → Crawl rate:
- Shows current crawl rate (requests/day)
- You can’t increase it, only decrease it (if Google is overloading your server)
If crawl rate is too low:
- Improve server speed
- Fix crawl errors
- Add internal links to important pages
- Update content more frequently
Checking Your Crawl Budget
Google Search Console
Settings → Crawl Stats:
- Total crawl requests: Pages crawled per day
- Total download size: Data transferred
- Average response time: Server speed
- Crawl requests by status: 200, 404, 301, etc.
What good stats look like:
- Crawl requests increasing over time (if adding content)
- Most requests returning
200 OK - Low
404and500errors - Average response time under 500ms
Red flags:
- Decreasing crawl requests (Google losing interest)
- High
500errors (server issues) - Slow response times (> 1 second)
Server Logs
Advanced: Analyze server logs to see exactly what Googlebot crawls.
Tools:
- Screaming Frog Log File Analyzer
- Splunk
- Custom scripts (grep/awk)
What to look for:
- Which pages Google crawls most
- Pages Google never crawls (orphans)
- Crawl frequency per section
Case Study: E-commerce Site
Problem:
- 50,000 product pages
- Google crawling 500 pages/day
- New products taking 3+ months to index
Investigation:
- 70% of crawl budget wasted on filter pages (
/shoes?color=red&size=10...) - 15% wasted on session IDs (
/product?session=abc123) - 10% on broken images, CSS files
Solution:
- Noindexed all filter combination pages
- Blocked session parameters in robots.txt
- Fixed broken links
- Submitted product-only sitemap
Result:
- Crawl budget shifted to actual product pages
- Google now crawling 2,000+ products/day
- New products indexed within 1 week
Common Myths
Myth: “More pages = better SEO”
Reality: 10,000 thin pages waste crawl budget. 100 high-quality pages rank better.
Myth: “I can increase crawl budget by requesting it”
Reality: Google sets crawl budget based on your site’s authority, server speed, and content quality. You can’t manually increase it.
Myth: “XML sitemaps increase crawl budget”
Reality: Sitemaps help Google discover pages, but don’t increase the total number of pages crawled per day. They help prioritize WHICH pages get crawled.
Myth: “Small sites need to optimize crawl budget”
Reality: If your site has under 1,000 pages, Google crawls it fully within days. Don’t waste time optimizing.
Quick Reference
Crawl budget wasters:
- Duplicate content
- Redirect chains
- Soft 404s
- URL parameters (filters, sorts, tracking)
- Slow server response
- Broken links
Crawl budget optimizations:
- Submit XML sitemap
- Use robots.txt to block low-value pages
- Fix crawl errors (500s, redirects)
- Improve server speed
- Manage URL parameters in Search Console
- Add internal links to important pages
What Surmado Checks
Surmado Scan looks for:
- Crawl errors (500, 404, redirect chains)
- Duplicate content wasting crawl budget
- URL parameters creating infinite spaces
- Slow server response times
- Orphaned pages not linked internally
→ Related: Robots.txt Essentials | XML Sitemaps Explained | Server Response Codes
Next Steps
Was this helpful?
Thanks for your feedback!
Have suggestions for improvement?
Tell us moreHelp Us Improve This Article
Know a better way to explain this? Have a real-world example or tip to share?
Contribute and earn credits:
- Submit: Get $25 credit (Signal, Scan, or Solutions)
- If accepted: Get an additional $25 credit ($50 total)
- Plus: Byline credit on this article