How to Crawl Your Website for a Foolproof Migration
Site migration is a process almost every company will go through at some point. Most marketers will experience at least one rebranding, merger, restructure, etc. during their lifetime.
The goal is always the same – to make the migration as smooth as possible, without losing any of the SEO strength built up on the current site.
Impossible? Not if you are careful to watch out for these three things:
- What do you have on the old site?
- What do you have on the new site?
- How can you connect them?
Here are the 3 phases of site migration and what to do during each phase:
|Know what you have on the current site.||Redirect old site links to new site pages.||Double check all redirects.|
|Understand any issues you have on the current site.||Make sure your new hosting/site/CMS doesn’t have any issues that you have seen on the old site (or new issues).||Check the site performance with WooRank advanced reviews.|
|Map out the structure of the current site.||Double check the on-page, HTTP status, indexing and canonical details.|
A crawler is an indispensable tool to use for site migration. WooRank has just released a site crawler for advanced analysis that helps you map out pages with issues and required actions across any site.
Whether you are in the pre-migration or post-migration phase, the WooRank site crawler will help you.
This site crawler delivers results under 4 sections:
- HTTP Status
Each section plays a key role in the migration process. Here’s how to use each one:
On-page results give you an assessment of what you have on your current site so that you can map your pages and identify any on-page issues.
On-page results are used during pre- and post-migration.
- Map out all the current pages that must be migrated to the new site
- Flag any issues on the current site that should be fixed on the new site, such as titles, descriptions or H1 tags
- Avoid bringing body content issues to the new site and understand how they happened in order to avoid repeating the issues.
- Audit your new site, ensuring all relevant on-page attributes are up to par.
- Verify your new site does not have issues with thin, blank or duplicate pages.
2. HTTP Status
HTTP status is another factor you will need to check in both pre- and post-migration. Here’s what to look for in your WooRank results:
- 5xx errors
- 4xx errors
- 3xx redirect errors
- HTTP/HTTPs inconsistencies
- A server returns a 5xx error code when a session times out or the server is overloaded.
- It is essential to test your new servers, ensuring they can withstand high traffic volume. Otherwise, you will end up with a ‘down’ site every time you experience a traffic spike – a time you definitely don’t want your site failing!
- A server returns 4xx codes when it receives an invalid request from the client (browser).
- It is important to note which URLs on the current and new sites lead to 404 pages and redirect them to the most relevant pages available.
- Watch out for these factors when it comes to 3xx redirects:
- Redirect loops.
- Ensure you are using the correct redirect type.
- Make sure you avoid forming redirect chains. This shouldn’t be an issue with a new site, but it is still worth checking.
- HTTP within HTTPS leads to inconsistency across the site and a bad user experience, as the browser will continually alert the user as they browse through the protocols.
- Ensure the new site is de-indexed while you are preparing it. You don’t want anyone landing on the new site before it’s ready for visitors!
- Review what is currently indexed and what pages are disallowed to get a clear picture of what is on the current site – what is indexed, what is not indexed and what is blocked.
When creating the new site, watch for these factors:
- Non-Indexable Pages: These pages are not being crawled and/or indexed because of their use of NOINDEX X-Robot-Tags, meta robots tags, robots.txt files or canonical tags. This part of your WooRank results will also identify whether or not each non-indexable page is included in your XML sitemap.
- Disallowed Pages: These pages are disallowed via your site’s robots.txt file.
- NOFOLLOW Pages: This section lists internal links (links on pages within your site that point to other pages within your site) that have the NOFOLLOW attribute. The NOFOLLOW attribute prevents pages from passing link juice to the rest of the site, meaning it could be less than fully optimized.
If you have any non-indexable, disallowed or nofollow pages, make sure it is for a good reason. Otherwise, all pages should be indexable, allowed and have the FOLLOW attribute in order to build SEO value.
This section of your WooRank results is essential for a successful site migration because:
- Canonical tags help search engines know which pages should be considered original – ‘canonical’ – and which should not.
- Having canonical issues means the tags cannot be accessed, are missing or point to broken URLs. In any case, it is important to address the issue and add the necessary tags to ensure you are not diluting the page’s SEO strength.
- Broken hreflang issues: When a page’s hreflang tag is non-existent or leads to a broken page. The role of hreflang is to tell search engines which URL belongs to which language and/or geographical audience.
Site migration is never an easy process and the complexity will depend on the size and conditions of the sites you are moving from and to.
The key to success is preparation.
If you follow all the above steps before migration, during migration and after migration, you will be able to secure the smoothest and most SEO friendly migration possible.