How to Generate a Sitemap XML (Manually or Automatically) for Faster Crawling and Indexing

Learn to generate sitemap xml manually or automatically, validate it, follow key rules, and submit in Google Search Console for faster crawling.

You’ve published new pages, fixed internal links, and refreshed your categories—then you wait. If search engines don’t find those URLs quickly (or at all), rankings stall and revenue follows. A generate sitemap xml workflow gives crawlers a clean “map” of what exists, what matters, and what changed, so discovery and indexing are more reliable.

In this guide, I’ll show you how to generate sitemap xml files manually and automatically, how to validate them, and how to submit them the right way so Google can crawl with less friction.

16:9 screenshot-style illustration of a sitemap.xml file open in a code editor beside Google Search Console “Sitemaps” submission screen; clean UI, high contrast; alt text: generate sitemap xml, sitemap.xml example, Google Search Console sitemap submission

What a Sitemap XML Is (and What It Isn’t)

A sitemap XML is a machine-readable file that lists indexable URLs you want search engines to crawl, often with metadata like lastmod. Google treats it as a discovery hint—not a guarantee—so it works best when paired with strong internal linking and correct status codes.

A sitemap XML is not:

A replacement for internal links (crawlers still follow your site architecture).
A place to include noindex pages, redirects, or 404s.
A magic indexing button (it helps discovery and prioritization).

For the official guidelines, rely on Google’s sitemap documentation.

When You Should Generate (or Regenerate) a Sitemap XML

If you’re unsure whether it’s worth the effort, use this rule: the more URLs and the more frequently they change, the more important it is to generate sitemap xml files properly.

Common triggers:

You launched a new site section (blog, collections, location pages).
You run e-commerce with frequently changing inventory.
You migrated URLs (redirects, canonical changes).
You publish content at scale (programmatic SEO, agency workflows).

In GroMach deployments, I’ve seen indexing become noticeably steadier after we stopped “letting the CMS guess” and started enforcing sitemap hygiene: only canonical 200 URLs, consistent lastmod, and clean segmentation.

The Sitemap XML Rules You Must Follow (So Crawlers Trust It)

Before you generate anything, align with hard constraints:

Use absolute URLs (include https://).
Include only canonical, indexable pages returning 200.
Keep each sitemap ≤ 50,000 URLs and ≤ 50MB uncompressed.
Use UTF-8 and valid XML syntax.
If you exceed limits, use a sitemap index file (a sitemap of sitemaps).

Reference protocol details: Sitemaps.org protocol.

How to Generate a Sitemap XML Automatically (Fastest Option)

Automatic generation is best for most sites because it stays updated as URLs change.

Option A: Use Your CMS (WordPress / Shopify / SaaS CMS)

Most modern CMS platforms can generate a sitemap automatically, either built-in or via plugins/apps. Your job is to verify what it includes and excludes.

Checklist (do this even if “it’s automatic”):

Confirm the sitemap URL (often /sitemap.xml).
Ensure tag pages, internal search pages, and filtered URLs are excluded.
Ensure image/video sitemaps are enabled only if you benefit from them.
Confirm lastmod updates when content changes (not just when it’s published).

If you’re on WordPress, you can also use established sitemap generators from the plugin ecosystem (example reference: XML Sitemap Generator for Google (WordPress plugin)).

Option B: Use a Crawler to Generate It (Great for Audits)

Tools that crawl your site like a bot are excellent when:

Your CMS sitemap is bloated.
You want to include only URLs matching strict rules.
You need to split sitemaps by section (e.g., /blog/, /products/).

A common industry option is Screaming Frog; their walkthrough is here: Screaming Frog XML Sitemap Generator guide.

Option C: Use an Online Generator (Good for Small Static Sites)

If you have a small brochure site and no CMS, an online generator can work. Be cautious with very large sites, authentication walls, or JavaScript-heavy rendering.

Example references:

How to Generate a Sitemap XML Manually (Best for Small or Highly Curated Sites)

Manual sitemaps are useful when you have:

Fewer than ~100–300 URLs
A need to tightly control what’s indexed
A site that rarely changes

Step-by-step manual sitemap (basic example)

Open a text editor and create sitemap.xml.
Add the XML header and <urlset> container.
Add one <url> entry per canonical URL.
Host it at https://example.com/sitemap.xml.

Minimal valid example:

<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
  <url>
    <loc>https://example.com/</loc>
    <lastmod>2026-02-11</lastmod>
  </url>
  <url>
    <loc>https://example.com/blog/</loc>
    <lastmod>2026-02-01</lastmod>
  </url>
</urlset>

What I’ve learned the hard way: manual sitemaps fail quietly when teams forget to update lastmod or accidentally list redirected URLs after a migration. If you go manual, make it someone’s explicit responsibility.

Special Cases: Large Sites, Multi-Sitemaps, and Sitemap Index Files

If you have many URLs, split your sitemap by type or section:

sitemap-pages.xml
sitemap-blog.xml
sitemap-products.xml

Then create a sitemap index:

<?xml version="1.0" encoding="UTF-8"?>
<sitemapindex xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
  <sitemap>
    <loc>https://example.com/sitemap-pages.xml</loc>
    <lastmod>2026-02-11</lastmod>
  </sitemap>
  <sitemap>
    <loc>https://example.com/sitemap-products.xml</loc>
    <lastmod>2026-02-11</lastmod>
  </sitemap>
</sitemapindex>

This is the cleanest way to generate sitemap xml coverage at scale without bloating a single file.

Problem	Likely Cause	Fix	Quick Test
Submitted URL blocked by robots.txt	`robots.txt` disallows key paths (e.g., `/products/`) or blocks user-agent `Googlebot`	Update `robots.txt` to allow the URLs in the sitemap; ensure no conflicting `Disallow` rules	Paste a sample URL into Google Search Console → URL Inspection; also run robots.txt tester
Sitemap contains redirects	CMS outputs non-canonical URLs (http, trailing slash differences, old paths) that 301/302	Update sitemap generator to output final 200 URLs only (canonical destinations); fix internal URL source	Curl a few sitemap URLs: `curl -I https://example.com/page` and confirm `200` (no `Location:`)
Sitemap has 404/500 URLs	Stale URL inventory, deleted products/posts, unstable backend	Regenerate sitemap from live routes only; remove/deprecate dead URLs; stabilize server errors	Spot-check via crawl or batch HEAD requests; verify status codes are `200`
Wrong canonical URLs	Canonical tags point elsewhere (parameterized URLs, alternate host, wrong language/region)	Fix canonical tag logic; ensure sitemap URLs match declared canonicals and preferred host	View source for a sample URL and compare `<link rel="canonical">` to sitemap entry
`lastmod` not updating	`lastmod` tied to publish date only, caching prevents refresh, DB timestamps not updated	Use true “last updated” timestamp; invalidate cache on content change; regenerate on schedule	Edit a page, regenerate sitemap, and confirm `lastmod` changed for that URL
Sitemap too large (50k/50MB limit)	Single sitemap includes too many URLs or uncompressed file exceeds size limit	Split into multiple sitemaps (e.g., `/sitemap-pages.xml`, `/sitemap-products.xml`, `/sitemap-blog.xml`) and reference via sitemap index; gzip if supported	Count URLs and file size; ensure each sitemap < 50,000 URLs and < 50MB uncompressed
Non-HTTPS URLs included	Mixed environment config, hardcoded base URL, legacy http links	Force HTTPS in generator; set preferred site URL; add/verify 301 http→https redirects	Check sitemap entries for `http://`; curl a sample `http://` URL and confirm it 301s to `https://`

Validate Your Sitemap XML (Don’t Skip This)

Validation prevents wasted crawl budget and confusing signals.

Run these checks:

Open the sitemap in a browser: it should load without errors.
Spot-check random URLs: they should return 200 and match the canonical.
Ensure you’re not listing:
- parameter URLs you don’t want indexed
- pagination duplicates (unless intentionally canonical)
- staging domains

If you’re using an automated workflow, add validation as a recurring task (weekly for most sites; daily for fast-moving e-commerce).

Submit Your Sitemap XML to Google (and Help Crawlers Find It)

1) Submit in Google Search Console

Go to Sitemaps
Enter your sitemap URL (e.g., /sitemap.xml or your sitemap index)
Submit and monitor “Success” vs “Has errors”

Use Google’s official submission guidance: Build and submit a sitemap.

2) Add it to `robots.txt` (recommended)

Add a line like:

Sitemap: https://example.com/sitemap.xml

This helps search engines discover it even without GSC access.

Bar chart showing “Indexation improvement after sitemap cleanup” with sample data—Week 1: 62% indexed, Week 2: 68%, Week 3: 74%, Week 4: 79%, Week 5: 83%; note that cleanup includes removing redirects/404s, enforcing canonicals, splitting sitemaps by section

How GroMach Helps You Generate Sitemap XML for Content at Scale

When you publish dozens (or thousands) of pages, the sitemap becomes operational—not just technical. In GroMach-based workflows, I’ve found the biggest wins come from coordinating three things: what gets created, what gets published, and what gets surfaced for crawling.

A practical “at scale” approach:

Generate SEO pages from keyword clusters (so URLs align to intent).
Auto-publish to WordPress/Shopify with consistent taxonomy.
Enforce rules so only canonical, indexable URLs land in the sitemap.
Track whether new URLs move from “Discovered” → “Crawled” → “Indexed” inside your SEO dashboard.

If you’re building programmatic content, the sitemap is your crawl throughput lever—done right, it reduces the lag between publishing and performance.

How to Add Sitemap to Google Search Console (Submit XML Sitemap to Search Console Easiest Way)

Conclusion: Your Sitemap XML Is a Crawl Contract—Keep It Clean

A sitemap is like a friendly concierge for search engines: it points crawlers to the doors you actually want opened. When you generate sitemap xml thoughtfully—only 200-status canonical pages, updated structure, correct splitting—you make crawling faster, indexing more predictable, and SEO easier to scale.

If you want, share your CMS (WordPress, Shopify, Next.js, custom) and approximate URL count in the comments, and I’ll suggest the cleanest sitemap setup for your situation.

FAQ: Generate Sitemap XML

1) How often should I regenerate a sitemap XML?

If your site changes frequently, daily is fine; otherwise weekly is enough. The key is keeping it accurate—stale sitemaps reduce trust.

2) Should I include `priority` and `changefreq`?

Usually no. Google largely ignores them; lastmod is more useful when it’s accurate.

3) Can a sitemap help index pages faster?

It can speed up discovery and crawling, especially for new or poorly linked pages, but it doesn’t guarantee indexing.

4) What URLs should be excluded from a sitemap?

Exclude redirects, 404/410, non-canonical duplicates, noindex pages, internal search results, and most parameter/filter URLs.

5) Where should the sitemap.xml file live?

Typically at the site root: https://example.com/sitemap.xml. For multiple files, use a sitemap index at the root.

6) What’s the difference between a sitemap and robots.txt?

A sitemap lists URLs you want crawled; robots.txt controls where crawlers are allowed to go. They work together.

7) Do I need separate sitemaps for images or videos?

Only if image/video discovery is a core goal (e.g., strong Image Search or Video SEO strategy). Otherwise, start with a clean URL sitemap first.