Google Introduces Two New Web Crawlers for Image and Video Research

Google has unveiled two new web crawlers designed specifically for gathering image and video content for research and development purposes. While the official documentation does not mention any impact on search rankings if these crawlers are blocked, it’s implied that publishers can opt out without affecting their rankings.

These new crawlers are distinct from those used for AI training, such as the Google-Extended crawler, which has a different function.

GoogleOther Crawlers

The newly announced crawlers are variants of the GoogleOther crawler, which was introduced in April 2023. The original GoogleOther crawler is used by various Google product teams for fetching publicly accessible content in one-off crawls for internal research and development.

The purpose of the original GoogleOther crawler is described as:

“GoogleOther is a generic crawler that may be used by various product teams to fetch publicly accessible content from sites. For example, it may be used for one-off crawls for internal research and development.”

New GoogleOther Variants

The two new variants of the GoogleOther crawler are:

  1. GoogleOther-Image
  2. GoogleOther-Video

These new crawlers are designed to scrape binary data, which includes files that are not text-based, such as images, audio, and videos. Text files, in contrast, are generally referred to as ASCII or Unicode files and can be viewed in a text editor. Binary files, however, cannot be opened in a text viewer.

The new GoogleOther crawlers are specialized for image and video content. Google has provided user agent tokens for these crawlers, allowing publishers to block them if desired.

GoogleOther-Image

  • User agent tokens:
    • GoogleOther-Image
    • GoogleOther
  • Full user agent string:
    • GoogleOther-Image/1.0

GoogleOther-Video

  • User agent tokens:
    • GoogleOther-Video
    • GoogleOther
  • Full user agent string:
    • GoogleOther-Video/1.0

Updated GoogleOther User Agent Strings

Google has also updated the user agent strings for the original GoogleOther crawler. Publishers can continue using the existing user agent token (GoogleOther) for blocking purposes. The new user agent strings provide detailed information about the crawlers, including the technology used. The version number is periodically updated to reflect the Chrome version in use (W.X.Y.Z is a placeholder for the Chrome version number).

GoogleOther Bot Family

These new crawlers might appear in server logs, and the provided information can help identify them as legitimate Google crawlers. This can assist publishers in deciding whether to allow their images and videos to be scraped for research and development purposes.

Leave a reply

Your email address will not be published. Required fields are marked *