Technical SEO

Crawling and Indexing Basics

Learn how search engines discover URLs, process pages, select canonical versions, and decide what can appear in search results.

Updated June 25, 2026 · beginner

Crawling and indexing are separate steps. Crawling is discovery and fetching. Indexing is the decision to store a page as a candidate for search results.

That distinction matters because a page can be crawlable but not indexed, indexed but not ranking, or blocked before search systems can evaluate it.

How discovery usually happens

Search engines discover URLs through links, XML sitemaps, redirects, previously known URLs, and external references.

The strongest internal discovery path is a normal HTML link from a crawlable page. Sitemaps help, but they are not a substitute for a clear site architecture.

Common indexing blockers

A noindex directive.
A canonical tag pointing to another URL.
Duplicate or near-duplicate content.
Soft 404 behavior.
Redirects, server errors, or blocked resources.
Very low-value pages that do not deserve indexation.

Practical diagnosis order

Start with the page itself. Check response code, robots access, canonical tag, rendered content, internal links, and sitemap inclusion.

Then compare the page against the search intent it is supposed to serve. Indexing is not only a technical permission. It also depends on whether the page is worth storing.

How discovery usually happens

Common indexing blockers

Practical diagnosis order

Keep navigating

Related topics

Useful tools