Machine Learning and Google Search - History and Significance

We have entered a new era of ML systems in Google Search. In this new era, the only reliably good thing for ranking is to create pages that actually focus on the user. Our founder, Kazushi, writes about the history and significance of machine learning in Google Search today.

Hi, I'm Kazushi, founder of JADE K.K., a technology consulting firm focused on product growth & integrity. I used to work for Google and Twitter, primarily within their spam detection teams. I coined the term ‘render budget’ in a 2019 article. Today I'd like to talk a little bit about what marketers should know about Machine Learning and Google Search.

I have conducted many job interviews for search marketers. One of my go-to questions is: “please describe the significance of Machine Learning for Google Search.” However, it's very rare for people to provide me with a satisfactory answer. One common mistake people tend to make is to simply list the names of ML models they know are being used by Google search: RankBrain, BERT, MUM, etc., without understanding these models’ significance or behaviour. I have come to the realisation that there are not many articles that comprehensively explain the real significance of Machine Learning for Google Search. In this article, I would like to do precisely that: why you should care about ML and why it’s crucial.

A little bit of history. Google Search was actually behind the curve come to Machine Learning. The first time that Google Search introduced Machine Learning to their major pipelines was in 2015, with the launch of RankBrain. This effort was driven primarily by researchers from Google Brain and not by search engineers. By this time, Google Ads and other products had already been investing a significant volume of resources to utilise Machine Learning in their systems for years, as Google had been trying to transform itself into an ‘AI-first’ company. For Search, however, it remained a side gig - they were on the receiving end of the movement, not at the forefront. According to a WIRED article, Amit Singhal, who had been leading Search since 2000, was a sceptic of ML, and many saw ranking as ‘too important for ML’.  With little investment from Search - it was an “experiment”, “let’s try to compute this extra score from the neural net and see if that’s a useful score”, according to Jeff Dean - RankBrain ended up proving the power of Machine Learning and debuted as the “third most important signal” in Google’s Search ranking.

This coincided with the retirement of Amit Singhal in 2016, stepping down suddenly, only to be impacted by a belated sexual harassment scandal. Google search was now headed up by John Giannandrea, one of the biggest Machine Learning advocates at Google. With the new leadership came a new era: it meant that Google Search was now all-in on applying ML to their systems. Since then, even after JG's departure to r Apple in 2018, Google search has been actively working on launching more Machine Learning related technologies in its pipeline.

In the past 2 to 3 years, we have observed this fundamental transformation of Google Search from being an ML lightweight to an ML Campion. It’s not the name of the models de jour they launch that’s important. The biggest significance is that Machine Learning is now baked into every system across Google Search. It's not solely about ranking. ML is being utilised from crawl prioritisation to canonical selection to title generation to site-level scoring. We should treat all of these systems as essentially being driven by Machine Learning and all of Google Search’s launches as being in some way related to a Machine Learning model. We have entered a new era of ML systems, unlike the Google of 2012. This is a fundamental change and a tectonic shift that many have yet realised the significance of.

You may ask what’s the significance of Google Search becoming an ML-based System? Here are a few:

(a) Google can now make predictions with much better accuracy in earlier phases of its crawling efforts. We started hearing about pages not being indexed in the past couple of years - and I believe this is a result of an introduction of models that make decisions on which page to crawl and index. In the past, Google systems weren’t able to reliably predict the quality of a given URL before indexing. They had to index the page to collect signals on whether or not to surface them in search results. But now based on ML output, they can predict whether a page is ranking-worthy or not.

(b) All the debate regarding “is this a ranking signal” is now meaningless. Although it has been for some time, it is even more so in the ML-enabled world. Everything can be a signal, but because the weights assigned to those signals are not hard-coded but learned, no signal will have a constant impact across search queries and documents. Attempts to hack the results are bearing less fruit... The only reliably good thing for ranking is to create pages that actually focus on the user. This has become a reality and is no longer the ideal.

(c) The units of learning have become more important. ML scoring must be done for a key, for a unit, and if the key changes, results will change drastically. This is even more visible for Google Ads. In Ads, how you structure your campaigns and ads groups within the account has become increasingly important because they are what the model’s learning is based on. These keys can be derived from URLs - which means that URL restructure and domain changes can have an increased and significant impact.

Now you can confidently answer the original question: What is the significance of Machine Learning in search engines today? The significance is that the entire pipeline has transformed to become an ML-based system, not just ranking. Because of this, tasks that are handled well by Machine Learning - prediction, classification, clustering, etc. - have been improved across the pipeline, across all components. This is the fundamental understanding you need before proceeding to talk about the models that have been presented by Google.

Machine learning systems, from an observer's point of view, behave very differently from non-ML systems. This is something Ad campaign managers have been confronting for some time now. I often use the analogy of an animal - you have to feed the right amount of food at the right time, give it the right amount of whips and carrots, and give it love in order for the beast to deliver you the best results. Tame the beast well.