30+ SEO Factors: What Google Checks in 2026
One of the projects I dug into back in March was WedInGeorgia. A site about weddings in Georgia, decent domain, decent content, two years of history. For its main keyword — page 7. Some seventy-something position. The owner was writing articles, ordering links, tweaking things, but rankings wouldn't move. I hooked Claude Code up to GSC through the API, ran a technical audit, looked at the on-page, checked schema.org.
Here's what turned up: the sitemap returned 404. Canonical on half the pages pointed to themselves via http, even though the site had been on https for ages. The homepage had three H1s. 80% of articles had no meta description, so Google generated its own from the first paragraph, and it wasn't click-worthy. There was no schema.org at all. The hreflang between ru/en versions was broken. LCP was 4.8 seconds. INP was dragging on mobile because of a heavy JS carousel on the homepage.
This isn't one problem. It's thirty problems at once. And until you fix them all, Google simply won't rank you. Not because any single factor is critical, but because in aggregate the site fails every quality threshold. Google in 2026 doesn't work off a single signal. It looks at dozens of factors simultaneously and computes an integrated score.
This post is a list of what I check on every project. Not theory pulled from someone else's blog. This is what actually moves rankings in 2026, based on working with a dozen production sites and hundreds of runs through GSC. On WedInGeorgia, two months after the fixes the site climbed from page 7 to page 2, and it keeps growing. No new link buys, no new articles — just by fixing what was already there.
Technical factors
This is the foundation. If the technical factors are broken, nothing else matters. Google simply won't get to your content. In GSC you'll see this as "Discovered, not indexed" or "Crawled, not indexed" — Google knows the page exists but decided not to put it in the index.
- Sitemap.xml — must return 200, contain every indexable page, and update automatically. A broken sitemap means Google doesn't know about half your pages. On top of that: the sitemap should be registered in GSC, and robots.txt should have a line
Sitemap: https://example.com/sitemap.xml. - Robots.txt — doesn't block what should be indexed. Accidentally locking
/blog/in robots is a classic mistake after migrating from a dev environment whereDisallow: /was the default. See what to block in robots.txt. - Hreflang — if you have multiple language versions, hreflang has to be symmetrical and valid. Broken hreflang means Google shows the ru version to English speakers, and they bounce in 3 seconds. That kills behavioral signals. See hreflang: subdomain or subdirectory.
- Canonical — points to the correct URL. A self-canonical on http when the site is on https leaves Google in limbo. Even worse: different pages with canonical pointing to one — Google will merge them and toss the duplicates. See when you need a canonical URL.
- HTTPS — no exceptions. HTTP in 2026 is a "drop this site" signal. The certificate has to be valid, with no mixed content (images/scripts over http on an https page).
- Mobile-friendly — Google has been indexing mobile-first for several years now. If the site breaks on a phone, you won't get rankings even on desktop. Check: open it in Chrome DevTools at 375px width and walk through the main flows.
- Indexable —
<meta name="robots" content="index, follow">or its absence (default). An accidentalnoindexon production is another common case. Especially when the frontend framework sets it conditionally and the condition fires somewhere it wasn't supposed to. - Crawl budget — if you have 10,000 pages but Google comes in and crawls 200 a day, new pages will take months to index. Fixed through prioritization in the sitemap, cleaning out junk URLs, and speeding up the server.
On-page
This is what Google sees when it lands on a specific page. At the HTML level. Same goes for what the user sees in the SERP before clicking: title and description. Which means it's about both ranking and CTR.
- Title — 50-60 characters, unique for each page, with the main keyword closer to the beginning. Not "Home | Site" but "Wedding in Georgia — Venues, Pricing, Registration." If you have the same title on ten pages, Google will decide they're duplicates and keep just one.
- Meta description — 140-160 characters, written to make people want to click. It's not a direct ranking factor, but it influences CTR, and CTR is a factor. If Google sees no one clicking your snippet, it'll start dropping you. See meta description: length and rules.
- H1 — one per page, reflecting the main topic. Three H1s leave Google confused about what's important. Technically HTML5 allows multiple, but in practice one works better. See H1: one or multiple.
- Structured headings — H2, H3 follow a hierarchy. Not H1 → H4 → H2. Google reads the structure and builds a map of the page from it. If the structure is broken, the map comes out ragged, and the page is harder to understand semantically.
- Internal links — every article links to 3-5 related ones. This redistributes weight between pages and helps new ones get indexed. The anchor text of a link is context for the target page. "Read here" doesn't work. "More on configuring hreflang" works.
- Image alt — every image with a meaningful alt. Not "image1.jpg" but "view of old Tbilisi from the Fabrika rooftop." This is accessibility, Google Images (a separate traffic channel), and context for AI engines, which are increasingly pulling images with descriptions into their answers.
- URL slug — short, meaningful, latin characters, hyphen-separated.
/svadba-v-gruziiis better than/post-id-127394?ref=main. The slug is part of the page's semantics.
Content
The murkiest category, because "quality" is subjective. But there are concrete signals that Google can extract algorithmically.
- Length — for commercial landing pages 800-1200 words is enough. For informational articles, 1500-2500. Less and Google considers it "thin content"; more and no one will read to the end, dragging dwell time down. Thin content under the Helpful Content Update 2022 is grounds for demoting the entire domain, not just the specific page.
- Uniqueness — no copy-paste from other sites, no AI generation without editing. Google has learned to detect both. Rewrite in your own words or don't publish. AI generation by itself isn't banned (Google officially confirmed this), but without human editing and fact-checking it reads like a template, and users feel it.
- Freshness — Google loves fresh content, especially for queries with "now" intent. Publish date, regular updates, current numbers. An old 2022 article about SEO in 2026 won't work. On Atlas (another project of mine) I regularly refresh old articles — adding an "Updated in [year]" section with current data. Rankings on those articles hold for years.
- Semantic keywords — not just the primary keyword, but related terms too. If you're writing about "buy coffee in Tbilisi," there should be mentions of "arabica," "espresso," "roast," "delivery." Google looks at the semantic field, not exact matches. That's BERT and MUM at work.
- E-E-A-T — Experience, Expertise, Authoritativeness, Trustworthiness. A named author, a bio, source links, contacts, a privacy policy. Especially critical in YMYL niches (money, health, law). The extra Experience is about the author's personal experience, not theory. Google in 2024 explicitly said it favors "I tried it and here's what happened" over "according to experts."
- FAQ — a section with typical questions at the end of the article. It's useful for the reader, and Google pulls it into featured snippets. Mark it with FAQPage schema — then Google shows it right in the SERP.
- Dwell time and pogo-sticking — how long the user stayed on the page and whether they bounced back to the SERP to click another result. Direct quality signals. If you don't answer the query, the user leaves in 5 seconds, and Google sees it through Chrome.
Core Web Vitals
Technical speed metrics. Google measures them through the Chrome User Experience Report — real user data, not lab. If your 75th percentile doesn't clear the threshold, you don't get rankings.
- LCP (Largest Contentful Paint) — main content should appear in under 2.5 seconds. Usually drags due to heavy images or a slow server.
- INP (Interaction to Next Paint) — response to a user action under 200 ms. Replaced FID in 2024. Heavy JS is the main killer.
- CLS (Cumulative Layout Shift) — the page shouldn't jump around while loading. Under 0.1.
More on each metric and how to fix them — LCP, INP, CLS: what they are and how to fix them.
Schema.org
Structured markup via JSON-LD. Google uses it to understand what's on your page and to show rich results in search.
- BlogPosting — for articles. Author, date, headline, image, organization.
- BreadcrumbList — breadcrumbs. Displayed directly in search results instead of a long URL.
- FAQPage — for question-and-answer sections. Can pull your FAQ into a featured snippet or right under the snippet in the SERP.
- HowTo — for step-by-step instructions. Also yields a rich result.
Schema isn't about ranking directly, it's about CTR and how AI engines parse you. See JSON-LD and schema.org types.
Authority, backlinks, AI visibility
The slowest category. Internal factors you can move in a week. Not this one. This is a game of months and years.
- Domain Rating (DR) — a domain authority score from Ahrefs / SemRush. Grows from the quality and quantity of referring sites. A new domain needs at least 6-12 months to crack DR 20+. On a young domain it's pointless to chase competitive commercial queries — work around them through long-tail and informational queries.
- Anchor diversity — link anchors should be varied. 100% anchors of "buy apartment in Tbilisi" — Google flags it as spam and ignores it. A natural profile: 30-40% brand, 20-30% URL, 20% generic ("here," "via this link"), 10-20% exact keywords. If you have 80% exact keywords, that's paid links, and Google sees it.
- Referring domains — what matters is the number of unique domains, not the number of links. A hundred links from one blog = one link. Ten links from ten different topical domains = ten times more valuable.
- Link relevance — a link from a topical site is worth many times more than one from a general site. A link to an SEO article from a marketing blog > a link from the same-DR general news portal.
- Citations in ChatGPT / Perplexity — a new factor in 2026. If AI engines cite you in their answers, that's a signal of authority and traffic. ChatGPT already drives a noticeable flow to articles it treats as a source. Check: ask ChatGPT a question in your niche and see if it cites you. If not, you need to work on entity presence.
- Brand mentions — Google also accounts for brand mentions without links. You can see it in crawl logs. If people write about you on Reddit, on forums, on Twitter, Google takes it as a signal.
How to check it all in one pass
Thirty factors is a lot. By hand you'll spend a day on a single page. And that's just the check, no fixes yet.
20 of these factors can be checked automatically. I built my own tool precisely because I got tired of running the same checklist by hand every time. You feed it a URL, you get a list of issues, prioritized by impact. Technical, on-page, schema, Core Web Vitals. Then you go fix them — yourself or by feeding the report to Claude Code so it makes the changes in code.
What the tool doesn't check automatically: E-E-A-T (needs a human to evaluate expertise), backlinks (separate services), AI visibility (a separate pipeline through the ChatGPT and Perplexity APIs). For those, you have to handle it manually or with separate scripts.
What does NOT work in 2026
Old tricks that still produced results five years ago are now either ignored or get you penalized.
- Keyword stuffing — repeating the main keyword every other sentence. Google rolled out BERT in 2019 and LaMDA after. It understands meaning, it doesn't count repetitions. Stuffing today is a negative, not a positive.
- Exact-match domain —
kupit-kvartiru-tbilisi.comused to get a bonus. Now Google ignores it and even regards it with suspicion. - PBNs and paid guest links — Google has learned to detect private blog networks by pattern (templates, IPs, registration history). Paid links from guest posts in 2026 are a deindex risk, not a ranking boost.
- Duplicate pages — copying the same article with different keywords in the URL. Google will merge them into one or drop both. Only canonical and original content help here.
- Hidden text — white text on a white background, text in
display: none. The first crawler catches it and it triggers a manual penalty. - Sneaky redirects and cloaking — showing one thing to the bot, another to the user. Guaranteed ban.
If someone is offering you "promotion via 500 fat backlinks" or "unique content through spinning" — that's 2014. It doesn't work.
Bottom line
Prioritization is half the win. You can't fix everything at once; you'll drown.
The order is:
- Technical factors — without them, the rest is pointless. Sitemap, robots, canonical, https, hreflang. Fixed in a day.
- On-page — title, H1, meta description, image alt. Fixed in a week.
- Core Web Vitals — LCP first. Often solved by optimizing images and fonts.
- Schema.org — add at least BlogPosting and BreadcrumbList. Fixed in an hour per article.
- Content — rewrite thin pages, add FAQ, refresh the old stuff.
- Authority — that's the long game, months. In parallel with everything else.
Foundation first, then the structure on top. Otherwise any work on top of a broken foundation is for the drawer.