Technical SEO
Crawling and Indexing Basics
Learn how search engines discover URLs, process pages, select canonical versions, and decide what can appear in search results.
Crawling and indexing are separate steps. Crawling is discovery and fetching. Indexing is the decision to store a page as a candidate for search results.
That distinction matters because a page can be crawlable but not indexed, indexed but not ranking, or blocked before search systems can evaluate it.
How discovery usually happens
Search engines discover URLs through links, XML sitemaps, redirects, previously known URLs, and external references.
The strongest internal discovery path is a normal HTML link from a crawlable page. Sitemaps help, but they are not a substitute for a clear site architecture.
Common indexing blockers
- A
noindexdirective. - A canonical tag pointing to another URL.
- Duplicate or near-duplicate content.
- Soft 404 behavior.
- Redirects, server errors, or blocked resources.
- Very low-value pages that do not deserve indexation.
Practical diagnosis order
Start with the page itself. Check response code, robots access, canonical tag, rendered content, internal links, and sitemap inclusion.
Then compare the page against the search intent it is supposed to serve. Indexing is not only a technical permission. It also depends on whether the page is worth storing.