The semantic optimization of content in SEO involves evaluating the terms of a corpus. One of the methods to determine the relevance of words in a text is the analysis of term frequency. Term frequency (TF) is only part of the famous TF-IDF method for information retrieval. The other part is the inverse document frequency (IDF), let's zoom in on IDF. This week's article explains how IDF works. My goal is to show you the importance of creating content that is unique. Of course, there are many reasons other than SEO: reputation, notoriety...
🤔 What is inverse document frequency (IDF)?
Let's take an example:
In practice, the first step is to measure the frequency of term occurrence in the corpus (a set of documents). In this example, we see that the word "the" appears in every document, this provides no information to distinguish the documents.
However, the word "child" appears in only 1000 of the documents. Obviously, this word provides a differentiation element for the documents that contain it. It's a measure of a term's rarity.
Document frequency measures similarity (the fact that documents have similar terms in their content), here we prefer to measure rarity.
The formula looks like this:
Don't worry, here's the explanation. For each term, we take the total number of documents in the corpus and divide it by the number of documents containing our term. This gives us the measure of the term's rarity. However, we don't want the resulting calculation to indicate that the word "child" is 500 times more important than the word "game", we take the Log Base 10 of the result, to linearize this calculation. From a search engine's point of view, "Child" is therefore 10x more relevant than the term "game" in this corpus.
Here is the IDF table for the terms:
You can see that the best score goes to the rarest term. Interesting...
❓ What is the use of IDF?
"IDF as a measure of uniqueness": in this, search engines can identify what makes a given document unique and special. For my part, IDF provides much more value and information than the frequency of term occurrence (keyword density).
Let's take an example:
Do you want to rank among the 36 million websites that appear for the search query "outdoor games"? So you have millions of sites in competition! Your chances of being ranked in the TOP10 on Google for this term based on the quality of your content are close to zero. The only way for you to be ranked on this competitive SERP is to work on other ranking factors such as link building, social networks...
If you are new to this market, you have no chance of standing out against your competitors! My advice is to look for a different alternative: you should use additional terms to complete the user's request. In our example, if we add the word "idea" for "outdoor game idea" the number of results is only 340,000. Admit it, it's much less competitive!
By circumventing the main topic with rare words, you finally have a chance to appear on the first page of the SERP. This is why the use of long-tail keywords is so important today! Ask the right questions, answer the search intent, and you will be visible on the Web!
🔎 How to find rare words in SEO?
IDF highlights the importance of uniqueness in the content we create. Yes, this uniqueness strategy does not generate as many visitors as if you were ranked on a more generic keyword. But if you are new to a competitive market, you will not be able to rank in the TOP 10 with your content alone.
On the semantic tool SEOQuantum, we use an index in our WORDPRINT analyses based on Okapi BM25, an evolved version of TF*IDF and probably used by Google. This index is measured from 0 to 10,000, a value of 10,000 means that the lexeme is omnipresent in the analysis. It is thanks to this Wordprint analysis for "outdoor game" that I found the following rare words:
- Idea
- Protection
- Ladder
- Net
- etc.
If you can choose a smaller number of keywords (or phrases) with much less competition and create content around these requests, you can start to position yourself more easily and thus get visitors and monetize your audience: this is a ROI SEO strategy. Because even if the monthly search volume is low, the traffic you attract is highly qualified, which greatly increases your chances of converting your prospects into customers!
When I started in 2003, I was convinced that keyword analysis and strategy were based on search volume. Over time, this strategy proved to be long, painful, and risky. Quite quickly, I understood the importance of getting off the "beaten track" by standing out thanks to the inverse document frequency (IDF). Creating content that brings a new angle is often a very powerful way to start your SEO strategy and quickly attract qualified traffic.
🙏 Sources used to write this article
Need to go further?
If you need to delve deeper into the topic, the editorial team recommends the following 5 contents: