Google states that most websites don’t need to care about the crawl budget in its new episode of the Search Off the Record podcast. Gary Illyes says that a substantial portion of website owners should only worry about the crawl budget, but there is no need for the vast majority.
Gary Illyes Explanation
“We’ve been pushing back on the crawl budget, historically, typically telling people that you don’t have to care about it.
And I stand my ground and I still say that most people don’t have to care about it. We do think that there is a substantial segment of the ecosystem that has to care about it.
…but I still believe that – I’m trying to reinforce this here – that the vast majority of the people don’t have to care about it.”
When To Care About The Crawl Budget?
SEOs say that they want to have a huge number when it comes to the crawl budget, which means your website needs to have X-number of webpages to be concerned about the crawl budget. But this is not how exactly it works, as Gary Illyes says in a statement.
Gary Illyes Statement
“… well, it’s not quite like that. It’s like you can do stupid stuff on your site, and then Googlebot with start crawling like crazy.
Or you can do other kinds of stupid stuff, and then Googlebot will just stop crawling altogether.“
When asked to give a number, Illyes said a million URLs roughly is the benchmark before a website owner actually needs to worry about the crawl budget.
Few Factors That Affect Crawl Budget
Websites with more than a million URLs, the following factors are the indication of the crawl budget issues.
Factor #1: Webpages Haven’t Been Crawled In A Long Time
“What would I look at? Probably URLs that were never crawled. That’s a good indicator for how well discovered, how well crawled a site is…
So I would look at pages that were never crawled. For this you probably want to look at your server logs because that can give you the absolute truth.”
Factor #2: Huge Changes After A Long Period Of Time
“Then I would also look at the refresh rates. Like if you see that certain parts of the site were not refreshed for a long period of time, say months, and you did make changes to pages in that section, then you probably want to start thinking about crawl budget.”
How To Fix The Crawl Budget Issues?
Gary Illyes gives two suggestions for fixing the crawl budget issues.
First: Try eliminating pages that are not essential. Every page that Googlebot goes through limits the crawl budget for all other pages. So that way important content might not get crawled if there is a huge amount of gibberish content.
“Like if you remove, if you chop, if you prune from your site stuff that is perhaps less useful for users in general, then Googlebot will have time to focus on higher quality pages that are actually good for users.”
Second: Try to avoid sending “back off” signals to the Googlebot. Back off signals are the signals that make Googlebot stop crawling a site immediately.
“If you send us back off signals, then that will influence Googlebot crawl. So if your servers can handle it, then you want to make sure that you don’t send us like 429, 50X status codes and that your server responds snappy, fast.”