How to Generate a Sitemap XML (Manually or Automatically) for Faster Crawling and Indexing
Learn to generate sitemap xml manually or automatically, validate it, follow key rules, and submit in Google Search Console for faster crawling.
You’ve published new pages, fixed internal links, and refreshed your categories—then you wait. If search engines don’t find those URLs quickly (or at all), rankings stall and revenue follows. A generate sitemap xml workflow gives crawlers a clean “map” of what exists, what matters, and what changed, so discovery and indexing are more reliable.
In this guide, I’ll show you how to generate sitemap xml files manually and automatically, how to validate them, and how to submit them the right way so Google can crawl with less friction.

What a Sitemap XML Is (and What It Isn’t)
A sitemap XML is a machine-readable file that lists indexable URLs you want search engines to crawl, often with metadata like lastmod. Google treats it as a discovery hint—not a guarantee—so it works best when paired with strong internal linking and correct status codes.
A sitemap XML is not:
- A replacement for internal links (crawlers still follow your site architecture).
- A place to include noindex pages, redirects, or 404s.
- A magic indexing button (it helps discovery and prioritization).
For the official guidelines, rely on Google’s sitemap documentation.
When You Should Generate (or Regenerate) a Sitemap XML
If you’re unsure whether it’s worth the effort, use this rule: the more URLs and the more frequently they change, the more important it is to generate sitemap xml files properly.
Common triggers:
- You launched a new site section (blog, collections, location pages).
- You run e-commerce with frequently changing inventory.
- You migrated URLs (redirects, canonical changes).
- You publish content at scale (programmatic SEO, agency workflows).
In GroMach deployments, I’ve seen indexing become noticeably steadier after we stopped “letting the CMS guess” and started enforcing sitemap hygiene: only canonical 200 URLs, consistent lastmod, and clean segmentation.
The Sitemap XML Rules You Must Follow (So Crawlers Trust It)
Before you generate anything, align with hard constraints:
- Use absolute URLs (include
https://). - Include only canonical, indexable pages returning 200.
- Keep each sitemap ≤ 50,000 URLs and ≤ 50MB uncompressed.
- Use UTF-8 and valid XML syntax.
- If you exceed limits, use a sitemap index file (a sitemap of sitemaps).
Reference protocol details: Sitemaps.org protocol.
How to Generate a Sitemap XML Automatically (Fastest Option)
Automatic generation is best for most sites because it stays updated as URLs change.
Option A: Use Your CMS (WordPress / Shopify / SaaS CMS)
Most modern CMS platforms can generate a sitemap automatically, either built-in or via plugins/apps. Your job is to verify what it includes and excludes.
Checklist (do this even if “it’s automatic”):
- Confirm the sitemap URL (often
/sitemap.xml). - Ensure tag pages, internal search pages, and filtered URLs are excluded.
- Ensure image/video sitemaps are enabled only if you benefit from them.
- Confirm
lastmodupdates when content changes (not just when it’s published).
If you’re on WordPress, you can also use established sitemap generators from the plugin ecosystem (example reference: XML Sitemap Generator for Google (WordPress plugin)).
Option B: Use a Crawler to Generate It (Great for Audits)
Tools that crawl your site like a bot are excellent when:
- Your CMS sitemap is bloated.
- You want to include only URLs matching strict rules.
- You need to split sitemaps by section (e.g.,
/blog/,/products/).
A common industry option is Screaming Frog; their walkthrough is here: Screaming Frog XML Sitemap Generator guide.
Option C: Use an Online Generator (Good for Small Static Sites)
If you have a small brochure site and no CMS, an online generator can work. Be cautious with very large sites, authentication walls, or JavaScript-heavy rendering.
Example references:
How to Generate a Sitemap XML Manually (Best for Small or Highly Curated Sites)
Manual sitemaps are useful when you have:
- Fewer than ~100–300 URLs
- A need to tightly control what’s indexed
- A site that rarely changes
Step-by-step manual sitemap (basic example)
- Open a text editor and create
sitemap.xml. - Add the XML header and
<urlset>container. - Add one
<url>entry per canonical URL. - Host it at
https://example.com/sitemap.xml.
Minimal valid example:
<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
<url>
<loc>https://example.com/</loc>
<lastmod>2026-02-11</lastmod>
</url>
<url>
<loc>https://example.com/blog/</loc>
<lastmod>2026-02-01</lastmod>
</url>
</urlset>
What I’ve learned the hard way: manual sitemaps fail quietly when teams forget to update lastmod or accidentally list redirected URLs after a migration. If you go manual, make it someone’s explicit responsibility.
Special Cases: Large Sites, Multi-Sitemaps, and Sitemap Index Files
If you have many URLs, split your sitemap by type or section:
sitemap-pages.xmlsitemap-blog.xmlsitemap-products.xml
Then create a sitemap index:
<?xml version="1.0" encoding="UTF-8"?>
<sitemapindex xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
<sitemap>
<loc>https://example.com/sitemap-pages.xml</loc>
<lastmod>2026-02-11</lastmod>
</sitemap>
<sitemap>
<loc>https://example.com/sitemap-products.xml</loc>
<lastmod>2026-02-11</lastmod>
</sitemap>
</sitemapindex>
This is the cleanest way to generate sitemap xml coverage at scale without bloating a single file.
| Problem | Likely Cause | Fix | Quick Test |
|---|---|---|---|
| Submitted URL blocked by robots.txt | robots.txt disallows key paths (e.g., /products/) or blocks user-agent Googlebot | Update robots.txt to allow the URLs in the sitemap; ensure no conflicting Disallow rules | Paste a sample URL into Google Search Console → URL Inspection; also run robots.txt tester |
| Sitemap contains redirects | CMS outputs non-canonical URLs (http, trailing slash differences, old paths) that 301/302 | Update sitemap generator to output final 200 URLs only (canonical destinations); fix internal URL source | Curl a few sitemap URLs: curl -I https://example.com/page and confirm 200 (no Location:) |
| Sitemap has 404/500 URLs | Stale URL inventory, deleted products/posts, unstable backend | Regenerate sitemap from live routes only; remove/deprecate dead URLs; stabilize server errors | Spot-check via crawl or batch HEAD requests; verify status codes are 200 |
| Wrong canonical URLs | Canonical tags point elsewhere (parameterized URLs, alternate host, wrong language/region) | Fix canonical tag logic; ensure sitemap URLs match declared canonicals and preferred host | View source for a sample URL and compare <link rel="canonical"> to sitemap entry |
lastmod not updating | lastmod tied to publish date only, caching prevents refresh, DB timestamps not updated | Use true “last updated” timestamp; invalidate cache on content change; regenerate on schedule | Edit a page, regenerate sitemap, and confirm lastmod changed for that URL |
| Sitemap too large (50k/50MB limit) | Single sitemap includes too many URLs or uncompressed file exceeds size limit | Split into multiple sitemaps (e.g., /sitemap-pages.xml, /sitemap-products.xml, /sitemap-blog.xml) and reference via sitemap index; gzip if supported | Count URLs and file size; ensure each sitemap < 50,000 URLs and < 50MB uncompressed |
| Non-HTTPS URLs included | Mixed environment config, hardcoded base URL, legacy http links | Force HTTPS in generator; set preferred site URL; add/verify 301 http→https redirects | Check sitemap entries for http://; curl a sample http:// URL and confirm it 301s to https:// |
Validate Your Sitemap XML (Don’t Skip This)
Validation prevents wasted crawl budget and confusing signals.
Run these checks:
- Open the sitemap in a browser: it should load without errors.
- Spot-check random URLs: they should return 200 and match the canonical.
- Ensure you’re not listing:
- parameter URLs you don’t want indexed
- pagination duplicates (unless intentionally canonical)
- staging domains
If you’re using an automated workflow, add validation as a recurring task (weekly for most sites; daily for fast-moving e-commerce).
Submit Your Sitemap XML to Google (and Help Crawlers Find It)
1) Submit in Google Search Console
- Go to Sitemaps
- Enter your sitemap URL (e.g.,
/sitemap.xmlor your sitemap index) - Submit and monitor “Success” vs “Has errors”
Use Google’s official submission guidance: Build and submit a sitemap.
2) Add it to robots.txt (recommended)
Add a line like:
Sitemap: https://example.com/sitemap.xml
This helps search engines discover it even without GSC access.

How GroMach Helps You Generate Sitemap XML for Content at Scale
When you publish dozens (or thousands) of pages, the sitemap becomes operational—not just technical. In GroMach-based workflows, I’ve found the biggest wins come from coordinating three things: what gets created, what gets published, and what gets surfaced for crawling.
A practical “at scale” approach:
- Generate SEO pages from keyword clusters (so URLs align to intent).
- Auto-publish to WordPress/Shopify with consistent taxonomy.
- Enforce rules so only canonical, indexable URLs land in the sitemap.
- Track whether new URLs move from “Discovered” → “Crawled” → “Indexed” inside your SEO dashboard.
If you’re building programmatic content, the sitemap is your crawl throughput lever—done right, it reduces the lag between publishing and performance.
How to Add Sitemap to Google Search Console (Submit XML Sitemap to Search Console Easiest Way)
Conclusion: Your Sitemap XML Is a Crawl Contract—Keep It Clean
A sitemap is like a friendly concierge for search engines: it points crawlers to the doors you actually want opened. When you generate sitemap xml thoughtfully—only 200-status canonical pages, updated structure, correct splitting—you make crawling faster, indexing more predictable, and SEO easier to scale.
If you want, share your CMS (WordPress, Shopify, Next.js, custom) and approximate URL count in the comments, and I’ll suggest the cleanest sitemap setup for your situation.
FAQ: Generate Sitemap XML
1) How often should I regenerate a sitemap XML?
If your site changes frequently, daily is fine; otherwise weekly is enough. The key is keeping it accurate—stale sitemaps reduce trust.
2) Should I include priority and changefreq?
Usually no. Google largely ignores them; lastmod is more useful when it’s accurate.
3) Can a sitemap help index pages faster?
It can speed up discovery and crawling, especially for new or poorly linked pages, but it doesn’t guarantee indexing.
4) What URLs should be excluded from a sitemap?
Exclude redirects, 404/410, non-canonical duplicates, noindex pages, internal search results, and most parameter/filter URLs.
5) Where should the sitemap.xml file live?
Typically at the site root: https://example.com/sitemap.xml. For multiple files, use a sitemap index at the root.
6) What’s the difference between a sitemap and robots.txt?
A sitemap lists URLs you want crawled; robots.txt controls where crawlers are allowed to go. They work together.
7) Do I need separate sitemaps for images or videos?
Only if image/video discovery is a core goal (e.g., strong Image Search or Video SEO strategy). Otherwise, start with a clean URL sitemap first.