Google is committed to maintaining the integrity of the anchor text signal by effectively disregarding links originating from spam sites. During a recent SEO office hours session at Google, Duy Nguyen from the search quality team shed light on the subject of links associated with spam sites and their impact on trust.
What caught attention was the mention of Google’s efforts in safeguarding the anchor text signal. Interestingly, this aspect is not commonly addressed or discussed. For numerous publishers and SEOs, establishing trust with Google holds immense significance, as it directly affects indexing and rankings.
However, it is worth noting that the concept of “trust” lacks a concrete metric, leading to occasional confusion within the search community. How can an algorithm place trust in something if it cannot measure it?
While Googlers have not explicitly answered this question, glimpses of insight can be gleaned from patents and research papers. These resources provide a glimpse into the measures taken by Google to counteract spam influence.
Not Trusting Links from Spam Sites
One participant in the SEO office hours raised an intriguing question: “If a domain is penalized, does it impact the outbound links from that domain?”
Duy Nguyen, the Googler in question, responded by stating, “By ‘penalize,’ I assume you mean the domain was demoted by our spam algorithms or manual actions. In general, we do not trust links from known spam sites. This approach helps us maintain the quality of our anchor signals.”
Trust and Links
When Googlers mention trust, they refer to the algorithms’ capacity to place trust or lack thereof in certain elements. In this context, it pertains specifically to the anchor text signal rather than merely discounting links from spam sites. The SEO community often emphasizes the concept of “building trust,” but in this scenario, the focus is primarily on preventing the proliferation of spam.
Determining Spam: Google’s Process
Not all websites face penalties or manual actions. Some may even remain unindexed, which is where Google’s Spam Brain comes into play. Spam Brain is an AI platform that assesses webpages at different stages, starting from the crawling process.
Spam Brain’s functions include:
- Indexing Gatekeeper: At the crawling stage, Spam Brain blocks sites, including content discovered through search console and sitemaps.
- Hunting Down Indexed Spam: Spam Brain identifies and addresses indexed spam when considering sites for ranking.
The Spam Brain platform operates by training an AI model using Google’s extensive knowledge of spam. While Google hasn’t explicitly detailed this “knowledge of spam,” several patents and research papers shed light on the topic.
For those interested in delving deeper into this subject, I have written an article on link distance ranking algorithms, which explores a method for ranking links. Additionally, I have published a comprehensive piece that covers various research papers discussing link-related algorithms, potentially shedding light on the workings of the Penguin algorithm.
Although most of these patents and research papers were published within the last decade, no additional publications have emerged from search engines or university researchers since then. Nonetheless, the significance of these documents lies in the possibility of their integration into Google’s algorithm, potentially through training AI models like Spam Brain.
The patent described in the link distance ranking article outlines a method for assigning ranking scores to pages based on the distances between a set of trusted “seed sites” and the pages they link to. These seed sites serve as starting points for assessing the normality or spamminess of other sites.
The underlying idea is that the farther a site is from a seed site, the more likely it is to be considered spammy. The concept of evaluating spamminess through link distance is further explored in research papers referenced in the aforementioned Penguin article.