How WordLift Found Relevant Pages to Redirect 404s
When you have tons of broken links to redirect, you want to find, in the least amount of time, the best candidate to fix these links with a redirection strategy. This article is about link reclamation: an SEO tactic to fix lost links. We will learn how to scrape the results of Bing using Google Sheets to automate the discovery of similar content and to restore the broken links using a 301 redirect.
Links are still one of the most important factors for ranking in the search engines and while digital marketers allocate a significant amount of time, effort and budget to building new links, maintaining existing links is often overlooked. Link reclamation is the process of identifying and fixing lost links and building links from citations around the web where a company is mentioned but no link was added.
Broken links are particularly common after site migrations or site restructuring, which can lead to a drop in organic traffic that could be avoided with a well-thought-out migration strategy.
WooRank recently carried out a study analyzing the backlinks of 1 million ecommerce websites and found out that over 12% of backlinks to those sites end up on a 404 page. This is not only bad user experience due to users landing on a non-existing page, but is also a huge loss in terms of link juice. Fixing links repairs the juice stream, which restores ranking power.
Link reclamation, therefore, is an integral part of any SEO strategy, whether you are doing it yourself or using an SEO management service.
What we found
After analyzing the backlinks on one of our VIP clients we found a number of high authority pages with broken links to content that has been removed/moved without redirects being put in place.
In order to tackle this issue, we created a Google Sheet with the details of all broken links identified. This type of spreadsheet helps you prioritize the work and should include:
Referring Page URL: the URL linking to the page that no longer exists on your site
Link Anchor: what text is being linked on the source website
Domain Rating/Priority: this helps you define where to focus the attention
Redirect Link: the new destination where you want to redirect the old broken URL
Now, when working on small websites with a few hundred pages you can consider resurrecting the old pages (which might be your best option), link to similar content or, as a very last resort, redirect the user to the homepage.
On the other hand, when the website you are working on has thousands of web pages and hundreds of broken links, as was the case with our VIP website, finding similar content can become very time-consuming.
As SEOs, search engines are at the very core of our daily job; they can also become an invaluable resource when you need to find "relevant" content for a given query string (the link anchor used in the broken link) on a client website. Let’s get into it.
We have a query string represented by the link anchor of the broken link and the URL of the target website. Major search engines like Google and Bing let you refine your search using commands known as search operators (here is, for example, the list of ways you can refine your searches in Google). If we want to look for a given query string within the WooRank website we will use the site operator (i.e. site:woorank.com link juice will only look for relevant pages that talk about link juice published under the woorank.com domain).
We can add a simple script in Google Sheets to ask the search engine what pages we could redirect to. Our script will need to scrape the search engine result page and bring the relevant URL back into our spreadsheet for each broken link.
The functions that we can use for web scraping using Google Sheets are the following:
Any of these will scrape a website using the different parameters supported by the function.
The main difference is the type of source content that you need to scrape. In our case, we are going to scrape the SERP of Bing. Google is now preventing Google Script from scraping its own SERP; it worked until 2017 but it no longer does. We can still crawl the SERP of Google but you cannot do it from Google Sheet using an out-of-the-box function such as ImportXML. Having said that, Bing does a pretty amazing job when you ask it to find something relevant on a given website.
Scraping Bing’s SERP with Google Sheets using ImportXML
The ImportXML function in Google Sheets is used to pull out specific information using HTML IDs and classes. This requires a general understanding of HTML and it can suck a lot of your time if you’re not familiar with XML parsing but, no worries, you just need to follow these steps:
Find the cell in the spreadsheet that corresponds to the link anchor (in the sample string below is on the D2 cell)
Use the following function on a separate column
=IMPORTXML("https://www.bing.com/search?q=site:https://www.woorank.com "& D2,"//*[@id='b_results']/li/h2/a/@href")
Drag the cell from its bottom right corner to the end of the column to copy the same function to all the cells in the column. When you release the click, the formula from the first cell will be copied into every cell in your selection.
Here is an example of a query on Bing that will help us find the most relevant content on the WooRank website for a given query such as link juice. Using ImportXML we get the href link behind the first result of the search engine.
When doing so you will see that in several cases the search engine is not capable of finding a similar piece of content on your website and it will instead show the homepage of your site. This could happen for instance when the link anchor is not relevant or simply when it’s no longer possible to find a similar page on the whole website. In our example, we have been able to use the results provided by Bing on one-fifth of the total broken links. It seems like a small percentage but it still corresponds to hundreds of pages that we would otherwise have to look for manually.
Are you also using Google Sheets to automate your SEO tasks? Tell us more about it, we are so happy when we can learn new tactics!