Crawl Budget: Why Google Doesn't Index All Your Pages

9 min read

Crawl Budget: Why Google Doesn’t Index All Your Pages

Reading time: 9 minutes

Quick Definition: Crawl budget is the number of pages Google will crawl on your site within a given timeframe (usually per day). If your site has 10,000 pages but Google only crawls 100/day, it’ll take 100 days to index everything. assuming you don’t add new pages in the meantime.

Key insight: Small sites (under 1,000 pages) rarely need to worry about crawl budget. Large sites, e-commerce stores, and news sites should optimize it.

TLDR

Crawl budget is how many pages Google will crawl on your site per day. Small sites under 1,000 pages don’t need to worry. Google crawls them fully within days. Large sites waste budget on duplicate content, redirect chains, broken links, and infinite filter combinations. One e-commerce site blocked low-value filter pages and went from 500 products crawled daily to 2,000, cutting new product indexing from months to one week. Optimize by submitting an XML sitemap, blocking low-value pages, and improving server speed.

How Crawl Budget Works

Google’s crawler (Googlebot) has limited resources. It decides:

How many pages to crawl on your site (crawl rate)
Which pages to prioritize (crawl demand)

Crawl rate limit:

Determined by your server’s capacity
Google won’t crawl so fast it crashes your server
Higher for sites with fast servers and good hosting

Crawl demand:

How popular is the page? (traffic, backlinks)
How frequently does it update?
Is it already indexed and ranking?

Crawl budget = Rate limit × Demand

Who Needs to Care About Crawl Budget?

You SHOULD optimize if:

E-commerce site with 10,000+ products
News site publishing 50+ articles/day
Site with millions of pages (large directories, databases)
International site with many language/country variations
Site with heavy URL parameters (filters, sorts, sessions)
Site suffering from slow indexing (new pages take weeks to appear)

You probably DON’T need to worry if:

Blog with under 1,000 pages
Small business site (5-50 pages)
Portfolio or brochure site
New site with limited content

Google’s own guidance: Sites under 1,000 URLs are crawled efficiently without intervention.

What Wastes Crawl Budget

1. Duplicate Content

Problem:

example.com/product/blue-widget
example.com/product/blue-widget?ref=homepage
example.com/product/blue-widget?sort=price
example.com/product/blue-widget?color=blue

Google crawls 4 URLs, but they’re all the same content.

Fix:

Use canonical tags pointing to /product/blue-widget
Block parameters in robots.txt: Disallow: /*?
Set parameter handling in Google Search Console

2. Low-Quality/Thin Pages

Examples:

Empty category pages
“No results found” search pages
Paginated pages with minimal content
Automatically generated doorway pages

Fix:

Noindex thin pages
Consolidate content
Use robots.txt to block crawling

3. Soft 404s (Fake 404s)

Problem: Pages that don’t exist but return 200 OK instead of 404 Not Found.

Example:

GET /this-page-doesnt-exist
Response: 200 OK
Body: "Sorry, page not found"

Google crawls these thinking they’re real pages, wasting budget.

Fix: Return proper 404 status codes for missing pages.

4. Redirect Chains

Problem:

Page A → 301 → Page B → 301 → Page C → 301 → Page D

Google must crawl 4 URLs to reach the final destination.

Fix: Redirect directly:

Page A → 301 → Page D
Page B → 301 → Page D
Page C → 301 → Page D

Problem: E-commerce filters creating millions of combinations:

/shoes
/shoes?color=red
/shoes?color=red&size=10
/shoes?color=red&size=10&brand=nike
/shoes?color=red&size=10&brand=nike&price=50-100
...

Fix:

Use noindex on filtered pages
Implement rel="canonical" to main category
Block filter parameters in robots.txt
Use AJAX filters (not changing URL)

6. Broken Links (404s)

Problem: Internal links pointing to non-existent pages.

Why it wastes budget: Google crawls the 404, gets nothing useful, but still counts it against your budget.

Fix:

Run regular broken link audits (Screaming Frog, Ahrefs)
Fix internal 404s (update links or redirect)

7. Orphaned Pages

Problem: Pages with zero internal links pointing to them.

Why it matters: If Google can’t find the page through your site navigation, it may never crawl it (unless it has external backlinks).

Fix:

Add pages to your sitemap
Link to them from relevant pages
Check for orphans with crawl tools

How to Optimize Crawl Budget

1. Submit an XML Sitemap

Why it helps: Tells Google exactly which pages exist and how often they change.

How:

Generate sitemap (most CMS do this automatically)
Submit via Google Search Console
Keep it updated (remove deleted pages, add new ones)

Sitemap priorities:

<url>
  <loc>https://example.com/important-page</loc>
  <priority>1.0</priority>
  <changefreq>daily</changefreq>
</url>

Note: Priority and changefreq are hints, not commands. Google may ignore them.

2. Fix Crawl Errors

Check Google Search Console:

Coverage → Errors
Look for server errors (500, 503)
Fix broken redirects
Resolve DNS issues

Common errors:

Server error (5xx)
Redirect error
Submitted URL not found (404)

3. Improve Site Speed

Why it matters: Faster servers = Google can crawl more pages in the same time.

Optimizations:

Upgrade hosting (shared → VPS → dedicated)
Enable gzip compression
Optimize database queries
Use a CDN for static assets
Reduce server response time (aim for <200ms)

Check speed:

Google Search Console → Settings → Crawl Stats
Shows avg response time, crawl requests/day

4. Use Robots.txt Strategically

Block low-value pages:

User-agent: *
Disallow: /search?
Disallow: /filter?
Disallow: /cart/
Disallow: /checkout/
Disallow: /admin/

Allow high-value pages:

Allow: /products/
Allow: /blog/

5. Manage URL Parameters

Google Search Console → Settings → URL Parameters:

Sorts (price-low-high): Tell Google to ignore
Filters (color=red): Representative URL
Pagination (page=2): Let Googlebot decide
Tracking (utm_source): Tell Google to ignore

Example configuration:

Parameter: color
Effect: No URLs
Googlebot: No URLs (parameter doesn't change page content significantly)

6. Update Content Regularly

Why: Google prioritizes crawling pages that change frequently.

Strategy:

Refresh old blog posts (add new info, update dates)
Keep product descriptions current
Remove outdated seasonal content
Publish new content consistently

Evidence Google is crawling:

Google Search Console → Settings → Crawl Stats
Check “Total crawl requests” over time

7. Internal Linking

Why it helps: Google discovers pages by following links. More internal links = easier discovery.

Best practices:

Link to new pages from high-authority pages (homepage, popular posts)
Use descriptive anchor text
Don’t bury important pages 5+ clicks deep
Create hub pages linking to related content

8. Monitor and Adjust Crawl Rate

Google Search Console → Settings → Crawl rate:

Shows current crawl rate (requests/day)
You can’t increase it, only decrease it (if Google is overloading your server)

If crawl rate is too low:

Improve server speed
Fix crawl errors
Add internal links to important pages
Update content more frequently

Checking Your Crawl Budget

Google Search Console

Settings → Crawl Stats:

Total crawl requests: Pages crawled per day
Total download size: Data transferred
Average response time: Server speed
Crawl requests by status: 200, 404, 301, etc.

What good stats look like:

Crawl requests increasing over time (if adding content)
Most requests returning 200 OK
Low 404 and 500 errors
Average response time under 500ms

Red flags:

Decreasing crawl requests (Google losing interest)
High 500 errors (server issues)
Slow response times (> 1 second)

Server Logs

Advanced: Analyze server logs to see exactly what Googlebot crawls.

Tools:

Screaming Frog Log File Analyzer
Splunk
Custom scripts (grep/awk)

What to look for:

Which pages Google crawls most
Pages Google never crawls (orphans)
Crawl frequency per section

Case Study: E-commerce Site

Problem:

50,000 product pages
Google crawling 500 pages/day
New products taking 3+ months to index

Investigation:

70% of crawl budget wasted on filter pages (/shoes?color=red&size=10...)
15% wasted on session IDs (/product?session=abc123)
10% on broken images, CSS files

Solution:

Noindexed all filter combination pages
Blocked session parameters in robots.txt
Fixed broken links
Submitted product-only sitemap

Result:

Crawl budget shifted to actual product pages
Google now crawling 2,000+ products/day
New products indexed within 1 week

Common Myths

Myth: “More pages = better SEO”

Reality: 10,000 thin pages waste crawl budget. 100 high-quality pages rank better.

Myth: “I can increase crawl budget by requesting it”

Reality: Google sets crawl budget based on your site’s authority, server speed, and content quality. You can’t manually increase it.

Myth: “XML sitemaps increase crawl budget”

Reality: Sitemaps help Google discover pages, but don’t increase the total number of pages crawled per day. They help prioritize WHICH pages get crawled.

Myth: “Small sites need to optimize crawl budget”

Reality: If your site has under 1,000 pages, Google crawls it fully within days. Don’t waste time optimizing.

Quick Reference

Crawl budget wasters:

Duplicate content
Redirect chains
Soft 404s
URL parameters (filters, sorts, tracking)
Slow server response
Broken links

Crawl budget optimizations:

Submit XML sitemap
Use robots.txt to block low-value pages
Fix crawl errors (500s, redirects)
Improve server speed
Manage URL parameters in Search Console
Add internal links to important pages

What Surmado Checks

Site Audit looks for:

Crawl errors (500, 404, redirect chains)
Duplicate content wasting crawl budget
URL parameters creating infinite spaces
Slow server response times
Orphaned pages not linked internally

Next Steps

Try Site Audit or run your report ($50) to optimize crawl efficiency | Log in

View all Scan features →

Was this helpful?

Help Us Improve This Article

Know a better way to explain this? Have a real-world example or tip to share?

Contribute and earn jobs:

Submit: Get 1 free job (AI Visibility, Site Audit, or Strategy)
If accepted: Get an additional free job (2 total)
Plus: Byline credit on this article

Contribute to This Article

Table of Contents

Crawl Budget: Why Google Doesn't Index All Your Pages

Crawl Budget: Why Google Doesn’t Index All Your Pages

TLDR

How Crawl Budget Works

Who Needs to Care About Crawl Budget?

You SHOULD optimize if:

You probably DON’T need to worry if:

What Wastes Crawl Budget

1. Duplicate Content

2. Low-Quality/Thin Pages

3. Soft 404s (Fake 404s)

4. Redirect Chains

5. Infinite Spaces (Faceted Navigation)

6. Broken Links (404s)

7. Orphaned Pages

How to Optimize Crawl Budget

1. Submit an XML Sitemap

2. Fix Crawl Errors

3. Improve Site Speed

4. Use Robots.txt Strategically

5. Manage URL Parameters

6. Update Content Regularly

7. Internal Linking

8. Monitor and Adjust Crawl Rate

Checking Your Crawl Budget

Google Search Console

Server Logs

Case Study: E-commerce Site

Common Myths

Myth: “More pages = better SEO”

Myth: “I can increase crawl budget by requesting it”

Myth: “XML sitemaps increase crawl budget”

Myth: “Small sites need to optimize crawl budget”

Quick Reference

What Surmado Checks

Next Steps

Help Us Improve This Article

Table of Contents

Crawl Budget: Why Google Doesn't Index All Your Pages

Crawl Budget: Why Google Doesn’t Index All Your Pages

TLDR

How Crawl Budget Works

Who Needs to Care About Crawl Budget?

You SHOULD optimize if:

You probably DON’T need to worry if:

What Wastes Crawl Budget

1. Duplicate Content

2. Low-Quality/Thin Pages

3. Soft 404s (Fake 404s)

4. Redirect Chains

5. Infinite Spaces (Faceted Navigation)

6. Broken Links (404s)

7. Orphaned Pages

How to Optimize Crawl Budget

1. Submit an XML Sitemap

2. Fix Crawl Errors

3. Improve Site Speed

4. Use Robots.txt Strategically

5. Manage URL Parameters

6. Update Content Regularly

7. Internal Linking

8. Monitor and Adjust Crawl Rate

Checking Your Crawl Budget

Google Search Console

Server Logs

Case Study: E-commerce Site

Common Myths

Myth: “More pages = better SEO”

Myth: “I can increase crawl budget by requesting it”

Myth: “XML sitemaps increase crawl budget”

Myth: “Small sites need to optimize crawl budget”

Quick Reference

What Surmado Checks

Next Steps

Help Us Improve This Article

Related Articles

XML Sitemaps Explained: Help Google Find Your Pages

Featured Snippets: Rank #0 in Google Search Results

Internal Linking Strategy: Connect Your Content for Better Rankings

Documentation

Blog Posts

Videos