There are three main steps for SEO success with Google, which are as follows:

a) Get your site crawled by Google bots.

b) Get your site indexed.

c) Get high search rankings.

In this article we are going to talk about the two important initial processes i.e. the crawling and indexing of web pages which leads to sites being shown in search results. Being seen by Google is important, as so far no other search engine has surpassed Google’s high popularity and user-preference.

What is crawling?

Search engine crawling refers to bots browsing particular pages on the web. If you have a newly launched site, the Google bot will have to find (crawl) your site’s web pages to know of its existence on the web. That said, the bot’s job does not end with crawling. It must index the pages too.

What is indexing?

Once a bot has found a page by crawling it, it then has to add the page to the list of other crawled pages belonging to the same category. This process is known as indexing. In a book you will find that the content is systematically arranged by category, word, reference, etc. in the index. This makes it easier for readers to find exactly what they are looking for in the book. Similarly, search engines have an index of pages categorized in various ways. These pages are not the pages from your website exactly, but a screenshot of the pages as they were seen the last time they were crawled. These screenshots are the cached versions of the pages.

When a user enters a search query in Google search, Google quickly goes through these indexes to judge which pages are appropriate to return in results. With the help of complicated mathematical algorithms, Google is able to decide where in the search results each page should be returned. The accuracy of Google in returning appropriate pages to users’ queries is what makes it such a huge search giant.

NOTE: The cached page returned may not be identical to the page that was recently changed on your website, however, when you add new content and provide easy accessibility to search engines they will crawl and index your pages over again in order to return the latest versions of your web pages in search results.

This all begs the question: How do I get my site indexed by Google? (Here the word “indexed” means crawled and indexed collectively.) There are many ways to get your website crawled and indexed by Google bots. See the steps below (which are in no
particular order):

1. Google Search Console Account

Get a Google Search Console Account and a Google Analytics account. Submit your site here. You can check Crawl Stats in Google Search Console to see how frequently Google is crawling your pages.

Crawl stats - Google Search Console

Google Search Console also allows you to see exactly how many pages have been indexed by Google.

Indexed Pages - Google Search Console

2. Fetch as Google

Google Search Console provides the option to ask Google to crawl new pages or pages with updated content. This option is located under the Crawl section and is called Fetch as Google.

Type the URL path in the text box provided and click Fetch. Once the Fetch status updates to Successful, click Submit to Index. You can either submit individual URLs or URLs containing links to all the updated content. With the former you can submit up to 500 URL requests per week, with the latter you can make 10 requests per month.

Fetch as Google - Google Search Console

3. XML Sitemaps

Sitemaps act as maps for the search bots, guiding them to your website’s inner pages. You cannot afford to neglect this significant step toward getting your site indexed by Google. Create an XML sitemap and submit it to Google in your Google Search Console account.

Add & Test XML Sitemap - Google Search Console

4. Inbound Links

Search engine bots are more likely find and index your site when websites that are often crawled and indexed link to it. For this to work you need to build quality links to your site from other popular sites. You can learn more about obtaining quality links from the 10 Link Building Strategies blog post from
WooRank.

5. Google+ Profile

Create a Google+ profile and add a link to your site in the About section. Add posts to Google+ containing links to your site. Since Google+ is a Google product the Google bot will pay attention to these links. You will also benefit from building other popular social media profiles for your website.

6. DMOZ.org

Previously Google used data obtained from the DMOZ web directory, so webmasters made a point of submitting their site to it. It is not a significant indexing factor anymore but it still does not hurt to submit to it.

7. Clean Code

Make the Google bot’s job of crawling and indexing your site easy by cleaning your site’s backend and ensuring you have W3C compliant code. Also, never bloat your code. Ensure there is a good text to html ratio in your website
content.

W3C Errors

8. Faster Site, Faster Indexing

Sites that are built to load quickly are also optimized for faster indexing by Google.

9. Good Internal Link Structure

Ensure that all pages of your website are interlinked with each other. Especially if your site’s home page has been indexed make sure all the other pages are interconnected with it so they will be indexed too, but make sure there are not more than 200 links on any given page.

10. Good Navigation

Good navigation will contribute to the link structure discussed above. As important as navigation structure is for your users, it is equally important for the fast indexing of your site. Quick Tip: Use breadcrumb navigation.

11. Add Fresh Content

Add quality content to your site frequently. Content of value attracts the bots. Even if your site has only been indexed once, with the addition of more and more valuable content you urge the Google bot to index your site repeatedly. This valuable content is not limited to the visible content on the page but also the Meta data and other important SEO components on the website. Keep these SEO tips for Website Content in mind.

These are the basic things you need to do to facilitate faster crawling and indexing by Google bots, but there might be other issues keeping your site from being indexed. Knowing these potential problems will come handy if you find your site is not being indexed.

Other things to consider

  • Server Issues: Sometimes it is not your website’s fault that it is not getting indexed but the server’s, that is, the server may not be allowing the Google bot to access your content. In this case, either the DNS delegation is clogging up the accessibility of your site or your server is under maintenance. Check for server issues if no pages have been indexed on your new site.

  • De-indexed Domain: You may have bought a used domain and if so, this domain may have been de-indexed for unknown reasons (most probably a history of spam). In such cases, send a re-consideration request to Google.

  • Robots.txt: It is imperative that you have a robots.txt file but you need to cross check it to see if there are any pages that have ‘disallowed’ Google bot access. This is a major reason that some web pages do not get indexed.

  • Sitemap Errors: After you have submitted your sitemap to Google Search Console the tool will alert you of any Sitemap errors. Google has listed a few of these errors and an explanation on how to fix each of them here.

  • Meta Robots: The following Meta tag is used to ensure that a site is not indexed by search engines. If a particular web page is not getting indexed, check for the presence of this code.

  • URL Parameters: Sometimes certain URL parameters can be restricted from indexing to avoid duplicate content. Be very careful when you use this feature (found in Google Search Console under Configuration), as it clearly states there that “Incorrectly configuring parameters can result in pages from your site being dropped from our index, so we don’t recommend you use this tool unless necessary. Clean your URLs to avoid crawling errors.

  • Check .htaccess File: The .htaccess file that is found in the root folder is generally used to fix crawling errors and redirects. An incorrect configuration of this file can lead to the formation of infinite loops, hindering the site from loading and being indexed.

  • Other Errors: Check for broken links, 404 errors, and incorrect redirects on your pages that might be blocking the Google bot from crawling and indexing your site.

    You can use Google Search Console to find out the index status of your site. This tool is free and collects extensive data about the index status of your site on Google. Click the Health option in Google Search Console to check the Index Status graph, as shown below:

    meta name=“robots” content=“noindex, nofollow”

  • URL Parameters: Sometimes certain URL parameters can be restricted from indexing to avoid duplicate content. Be very careful when you use this feature (found in Google Search Console under Configuration), as it clearly states there that “Incorrectly configuring parameters can result in pages from your site being dropped from our index, so we don’t recommend you use this tool unless necessary. Clean your URLs to avoid crawling errors.

  • Check .htaccess File: The .htaccess file that is found in the root folder is generally used to fix crawling errors and redirects. An incorrect configuration of this file can lead to the formation of infinite loops, hindering the site from loading and being indexed.

  • Other Errors: Check for broken links, 404 errors, and incorrect redirects on your pages that might be blocking the Google bot from crawling and indexing your site.

    You can use Google Search Console to find out the index status of your site. This tool is free and collects extensive data about the index status of your site on Google. Click the Health option in Google Search Console to check the Index Status graph, as shown in the screenshot below:

    Crawl Stats - Google Search Console

In case you want to check which URLs are not indexed, you can do so manually by downloading the SEOquake extension.