New York Tech Media
  • News
  • FinTech
  • AI & Robotics
  • Cybersecurity
  • Startups & Leaders
  • Venture Capital
No Result
View All Result
  • News
  • FinTech
  • AI & Robotics
  • Cybersecurity
  • Startups & Leaders
  • Venture Capital
No Result
View All Result
New York Tech Media
No Result
View All Result
Home AI & Robotics

Does Google Have A Problem With Big Robots.txt Files?

New York Tech Editorial Team by New York Tech Editorial Team
January 19, 2022
in AI & Robotics
0
Share on FacebookShare on Twitter

Google addresses the subject of robots.txt files and whether it’s a good SEO practice to keep them within a reasonable size.

This topic is discussed by Google’s Search Advocate John Mueller during the Google Search Central SEO office-hours hangout recorded on January 14.

David Zieger, an SEO manager for a large news publisher in Germany, joins the livestream with concerns about a “huge” and “complex” robots.txt file.

How huge are we talking here?

Zieger says there’s over 1,500 lines with a “multitude” of disallows that keeps growing over the years.

The disallows prevent Google from indexing HTML fragments and URLs where AJAX calls are used.

Zieger says it’s not possible to set a noindex, which is another way to keep the fragments and URLs out of Google’s index, so he’s resorted to filling the site’s robots.txt with disallows.

Are there any negative SEO effects that can result from a huge robots.txt file?

Here’s what Mueller says.

SEO Considerations For Large Robots.txt Files

A large robots.txt file will not directly cause any negative impact to a site’s SEO.

However, a large file is harder to maintain, which may lead to accidental issues down the road.

Mueller explains:

“No direct negative SEO issues with that, but it makes it a lot harder to maintain. And it makes it a lot easier to accidentally push something that does cause issues.

So just because it’s a large file doesn’t mean it’s a problem, but it makes it easier for you to create problems.”

Zieger follows up by asking if there are any issues with not including a sitemap in the robots.txt file.

Mueller says that’s not a problem:

“No. Those different ways of submitting a sitemap are all equivalent for us.”

Zieger then launches into a several more follow-up questions that we’ll take a look at in the next section.

Does Google Recognize HTML Fragments?

Zieger asks Mueller what would be the SEO impact of radically shortening the robots.txt file. Such as removing all the disallows, for example.

The following questions are asked:

  • Does Google recognize HTML fragments that aren’t relevant to site visitors?
  • Would HTML fragments end up in Google’s search index if they weren’t disallowed in robots.txt?
  • How does Google deal with pages where AJAX calls are used? (Such as a header or footer element)

He sums up his questions by stating most of what’s disallowed in his robots.txt file are header and footer elements that aren’t interesting for the user.

Mueller says it’s difficult to know exactly what would happen if those fragments were suddenly allowed to be indexed.

A trial and error approach might be the best way of figuring this out, Mueller explains:

“It’s hard to say what you mean with regards to those fragments

My thought there would be to try to figure out how those fragment URLs are used. And if you’re unsure, maybe take one of these fragment URLs and allow its crawling, and look at the content of that fragment URL, and then check to see what happens in search.

Does it affect anything with regards to the indexed content on your site?
Is some of that content findable within your site suddenly?
Is that a problem or not?

And try to work based on that, because it’s very easy to block things by robots.txt, which actually are not used for indexing, and then you spend a lot of time maintaining this big robots.txt file, but it actually doesn’t change that much for your website.”

Other Considerations For Building A Robots.txt File

Zieger has one last follow-up regarding robots.txt files, asking if there are any specific guidelines to follow when building one.

Mueller says there’s no specific format to follow:

“No, it’s essentially up to you. Like some sites have big files, some sites have small files, they should all just work.

We have an open source code of the robots.txt parser that we use. So what you can also do is get your developers to run that parser for you, or kind of set it up so that you can test it, and then check the URLs on your website with that parser to see which URLs would actually get blocked and what that would change. And that way you can test things before you make them live.”

The robots.txt parser Mueller refers to can be found on Github.

Hear the full discussion in the video below:


Featured Image: Screenshot from YouTube.com/GoogleSearchCentral, January 2022.


Credit: Source link

Previous Post

SentryBay collaborates with TD SYNNEX to protect corporate applications and data on the endpoint

Next Post

Looks like FedEx won’t be adding lasers to its airplanes

New York Tech Editorial Team

New York Tech Editorial Team

New York Tech Media is a leading news publication that aims to provide the latest tech news, fintech, AI & robotics, cybersecurity, startups & leaders, venture capital, and much more!

Next Post
Looks like FedEx won’t be adding lasers to its airplanes

Looks like FedEx won’t be adding lasers to its airplanes

  • Trending
  • Comments
  • Latest
Meet the Top 10 K-Pop Artists Taking Over 2024

Meet the Top 10 K-Pop Artists Taking Over 2024

March 17, 2024
Panther for AWS allows security teams to monitor their AWS infrastructure in real-time

Many businesses lack a formal ransomware plan

March 29, 2022
Zach Mulcahey, 25 | Cover Story | Style Weekly

Zach Mulcahey, 25 | Cover Story | Style Weekly

March 29, 2022
How To Pitch The Investor: Ronen Menipaz, Founder of M51

How To Pitch The Investor: Ronen Menipaz, Founder of M51

March 29, 2022
Japanese Space Industry Startup “Synspective” Raises US $100 Million in Funding

Japanese Space Industry Startup “Synspective” Raises US $100 Million in Funding

March 29, 2022
UK VC fund performance up on last year

VC-backed Aerium develops antibody treatment for Covid-19

March 29, 2022
Startups On Demand: renovai is the Netflix of Online Shopping

Startups On Demand: renovai is the Netflix of Online Shopping

2
Robot Company Offers $200K for Right to Use One Applicant’s Face and Voice ‘Forever’

Robot Company Offers $200K for Right to Use One Applicant’s Face and Voice ‘Forever’

1
Menashe Shani Accessibility High Tech on the low

Revolutionizing Accessibility: The Story of Purple Lens

1

Netgear announces a $1,500 Wi-Fi 6E mesh router

0
These apps let you customize Windows 11 to bring the taskbar back to life

These apps let you customize Windows 11 to bring the taskbar back to life

0
This bipedal robot uses propeller arms to slackline and skateboard

This bipedal robot uses propeller arms to slackline and skateboard

0
Coffee Nova’s $COFFEE Token

Coffee Nova’s $COFFEE Token

May 29, 2025
Money TLV website

BridgerPay to Spotlight Cross-Border Payments Innovation at Money TLV 2025

May 27, 2025
The Future of Software Development: Why Low-Code Is Here to Stay

Building Brand Loyalty Starts With Your Team

May 23, 2025
Tork Media Expands Digital Reach with Acquisition of NewsBlaze and Buzzworthy

Creative Swag Ideas for Hackathons & Launch Parties

May 23, 2025
Tork Media Expands Digital Reach with Acquisition of NewsBlaze and Buzzworthy

Strengthening Cloud Security With Automation

May 22, 2025
How Local IT Services in Anderson Can Boost Your Business Efficiency

Why VPNs Are a Must for Entrepreneurs in Asia

May 22, 2025

Recommended

Coffee Nova’s $COFFEE Token

Coffee Nova’s $COFFEE Token

May 29, 2025
Money TLV website

BridgerPay to Spotlight Cross-Border Payments Innovation at Money TLV 2025

May 27, 2025
The Future of Software Development: Why Low-Code Is Here to Stay

Building Brand Loyalty Starts With Your Team

May 23, 2025
Tork Media Expands Digital Reach with Acquisition of NewsBlaze and Buzzworthy

Creative Swag Ideas for Hackathons & Launch Parties

May 23, 2025

Categories

  • AI & Robotics
  • Benzinga
  • Cybersecurity
  • FinTech
  • New York Tech
  • News
  • Startups & Leaders
  • Venture Capital

Tags

3D bio-printing acoustic AI Allseated B2B marketing Business carbon footprint climate change coding Collaborations Companies To Watch consumer tech crypto cryptocurrency deforestation drones earphones Entrepreneur Fetcherr Finance Fintech food security Investing Investors investorsummit israelitech Leaders LinkedIn Leaders Metaverse news OurCrowd PR Real Estate reforestation software start- up Startups Startups On Demand startuptech Tech Tech leaders technology UAVs Unlimited Robotics VC
  • Contact Us
  • Privacy Policy
  • Terms and conditions

© 2024 All Rights Reserved - New York Tech Media

No Result
View All Result
  • News
  • FinTech
  • AI & Robotics
  • Cybersecurity
  • Startups & Leaders
  • Venture Capital

© 2024 All Rights Reserved - New York Tech Media