note · May 12, 2026 · 10 min read

Canonical URL: When You Need It, When It Hurts

Back in March, one of the projects I manage started losing visibility in search. Not the homepage — individual product pages. A page had been picking up impressions for two months, hung around the top 20 for several keywords, and then — crash. In a single week it dropped from 300 impressions a day to five. For one keyword, it fell out of the index entirely.

I jumped into Google Search Console and ran URL Inspection. Google reported: "User-declared canonical: https://example.com/". In other words, my own page was telling Google: "don't index me, index the homepage." I opened DevTools, looked at the <head> — sure enough, there was <link rel="canonical" href="https://example.com/">. On every page. The same canonical, pointing to the root of the domain.

Then I went digging to find out where it came from. Turned out somebody had been editing next.config.js, added a block with default meta tags, and put the canonical in there using process.env.SITE_URL without a path. The template got applied across the entire site. Google dutifully merged everything into one page.

It took three weeks to roll back and re-index. Some of those positions I still haven't recovered.

This is a typical case. Canonical is a powerful signal, and you can break it with a single line in a config. Over the past six months I've untangled stories like this on five different projects. The causes vary every time — sometimes a WordPress plugin, sometimes a Next.js config, sometimes a Shopify theme, sometimes a developer's edit that was "supposed to unify everything." The consequences are always the same: pages drop out of the index, and you can't figure out why until you dig into the source.

Below — what canonical actually is, when you really need it, and where it stops being a helper and turns into a shot in the foot.

What canonical is

Canonical is a hint to Google about which version of a page to treat as the main one when there are several similar or identical pages in the index.

Technically it lives in two places. First — in the HTML, inside <head>:

<link rel="canonical" href="https://example.com/product/red-shoes">

Second — in the HTTP-header, for non-HTML resources (PDF, images, documents):

Link: <https://example.com/whitepaper.pdf>; rel="canonical"

Here's the important part. Canonical is not a redirect. It's not a block. It's not even a directive. It's a signal, a hint. Google looks at the canonical and decides whether to trust it. In URL Inspection you'll see two fields: "User-declared canonical" (what you declared) and "Google-selected canonical" (what Google actually chose). They often don't match.

Google can ignore your canonical if:

it points to a noindex page
it points to a 404 or 5xx
the page contents are radically different
the canonical leads into a chain of other canonicals
you have several canonicals with different URLs on the same page

So it's a recommendation that Google weighs against a bunch of factors. But in 90% of cases — it listens. That's exactly why it's dangerous: you set a canonical hoping to "hint" to Google, and what actually happens is you've essentially redirected all your search traffic. Without a redirect, without a trace in your server logs.

Another point of confusion — canonical vs robots.txt vs noindex. These are different tools for different jobs. Robots.txt blocks crawling (Google doesn't load the page at all). Noindex allows crawling but blocks indexing. Canonical allows both crawling and indexing, but says: "in the results, show this version." Mix them up, and you'll end up with either pages Google never sees, or pages Google sees but never shows.

When canonical is needed

There are five scenarios where canonical actually helps.

Duplicate URLs with query parameters. The most common case. You run an online store, the product page is /product/red-shoes. The user applies filters — the URL becomes /product/red-shoes?size=42&color=red&utm_source=google. Same content. Without a canonical, Google can index both versions and split ranking between them. With a canonical pointing to /product/red-shoes — Google understands it's one page, and all the weight goes to the main URL.

The legacy HTTP/HTTPS, www/non-www story. If a site migrated from HTTP to HTTPS or from www to a bare domain, and the redirects aren't fully wired up — canonical can help. But honestly: a 301 redirect is always better in this case. Canonical is your backup parachute, the main mechanism should be a redirect at the nginx or CDN layer.

Print-version, AMP, mobile versions. If you have a separate print-friendly version of a page — /article?print=1 — it should canonicalize to the main one. Same story with the AMP version: it has a canonical pointing to the main HTML variant. Standard practice.

A/B tests. You're testing two versions of a landing page — /landing-a and /landing-b. Version B should canonicalize to version A (or both to a neutral URL). Otherwise you'll create a duplicate yourself, and Google will split ranking between the two halves of the test.

Sticky filters in listings. A category with sorting: /catalog, /catalog?sort=price, /catalog?sort=date. Same content, different order. Canonical of all variants to /catalog — Google ranks one page, not three similar ones.

Cross-domain canonical. This is an interesting case. If you've published an article on your site and then let another site reprint it — the copy should have a canonical pointing back to your original. Google respects this. I've seen cases where an article on a major media site with a canonical pointing to a small blog as the original — and the blog was the one that ranked. It works.

When canonical hurts

And now — five ways to break everything. I caught all of these by hand, either on my own sites or on client projects.

Canonical pointing to the homepage from every page. My story from the start of the post. This is the most common mistake, and it kills a site completely. It happens because of a poorly configured next.config.js, an Astro template, a WordPress plugin, a Shopify theme — anywhere the canonical is set by default without taking the current URL into account. Google indexes only the homepage. All the other pages — they don't appear in search results. You can check it with a single command:

curl -sS https://example.com/some-page | grep canonical

If the href in the output points to the root — you have a problem.

Self-canonical with a different domain or scheme. Site is on https://example.com, but the canonical is set to http://example.com/page (no s) or https://www.example.com/page (with www, even though the site has no www). Happens because of a hardcoded URL in a template or because of a migration where the canonical never got updated. Google either ignores it or merges it with a version that doesn't exist. To fix — five minutes; to find — sometimes hours.

Canonical to a noindex page. Self-contradiction. You're saying "here's the main version," and on that very main version — <meta name="robots" content="noindex">. In this case Google ignores the canonical, but it might not index the original either — because you explicitly said this isn't the main version. Usually happens on staging environments that accidentally end up in the index.

Canonical chains. Page A canonicalizes to B, B canonicalizes to C. Google only processes the first hop. So Google sees A → B, but doesn't see B → C. In the end B stays in the index, even though you wanted C. Chains pop up on sites with a history of migrations, where canonicals have been edited multiple times and never cleaned up.

Canonical when content is materially different. /search?q=seo has a canonical pointing to /search. Seems logical — it's the same search page, right? But ?q=seo returns results for "seo," and /search without a parameter — an empty form. Content is different, the query parameter materially changes the page. Google will merge them — and you'll lose all your SERP positions for specific search queries. The same goes for catalog filters that significantly change the listing: /laptops?brand=apple is not a duplicate of /laptops, it's a separate page that can rank for "macbook."

The core principle: canonical is needed where content is duplicated. If the content is different — canonical hurts.

How to check canonical

Three methods, in order of increasing accuracy.

curl. The fastest. On any page:

curl -sS https://example.com/page | grep -i canonical

It should return a <link rel="canonical" href="..."> line. If the href matches the requested URL (accounting for scheme, www, trailing slash) — that's self-canonical, all good. If it points somewhere else — figure out whether that's intentional or a bug.

DevTools. Open the page, F12, Elements tab, Ctrl+F, search for "canonical." You see what the browser actually rendered. This matters: sometimes canonical is injected via JavaScript and curl doesn't see it — but it's there in the DOM. Especially on SPAs like React/Next with client-side rendering.

Google Search Console → URL Inspection. The most honest method. You enter a URL — Google shows two fields:

User-declared canonical — what you have in the HTML
Google-selected canonical — what Google actually considers the main one

They should match. If they don't — Google ignored your canonical and picked something else. That's a signal: either your canonical is obviously broken, or the content looks like another page and Google decided to merge them. And "Google-selected" — that's what actually affects indexing and ranking, not what you declared.

Check this once a month on your key pages. On large catalogs — sample-based, through the GSC API across thousands of URLs at once.

Self-canonical pages

A best practice I implement on every project. Every unique page has a canonical pointing to itself — with the full absolute URL.

<!-- On the page /article/seo-tips -->
<link rel="canonical" href="https://example.com/article/seo-tips">

Why bother if the page is already unique? Because the page might be reachable through URL variants you don't know about:

?utm_source=... — ad parameters
?fbclid=... — Facebook click ID
#section — anchor
/article/seo-tips/ — trailing slash
/Article/Seo-Tips — different casing

Without self-canonical, Google can index any of these versions as separate pages. With self-canonical — everything collapses into the one you specified. It's insurance against the unexpected.

It's usually implemented in your Next.js / WordPress / any CMS template — with a function that takes request.url, normalizes it (strips query, normalizes case, applies trailing slash the way you've standardized), and plugs it into <link rel="canonical">. Set it up once — works on every page.

In Next.js you do this through metadata.alternates.canonical in layout.tsx or at the page level:

export const metadata = {
  alternates: {
    canonical: 'https://example.com/article/seo-tips',
  },
}

Important: the URL is absolute, with scheme and domain. Relative canonicals (/article/seo-tips) are technically valid, but I've seen cases where Google interpreted them strangely — especially when the page is reachable through multiple hosts (apex and www). Better to always write the full URL.

Bottom line

Canonical is a signal to Google: "when you see similar pages, index this one." Useful for query parameters, filters, A/B tests, syndications. Harmful when applied wholesale to the homepage, to a noindex page, to http on an https site, or when query parameters materially change the content.

The rule is simple. If content is duplicated — canonical is needed. If content is unique — self-canonical. If content is different — canonical isn't needed at all, let it be its own page with its own URL.

Verify through GSC URL Inspection. "Google-selected canonical" is the only number that actually affects search results. Everything else is your intentions.