Extract a clean list of page URLs from any XML sitemap in seconds. Paste your sitemap URL, click extract, then copy all URLs for audits, migrations, and indexing checks.
What this tool does
- Fetches and parses a public XML sitemap URL you provide
- Extracts only page URLs (filters out common non-page entries like image URLs)
- Outputs a plain URL list you can copy to your clipboard
How to use the Sitemap URL Extractor
-
Paste your XML sitemap URL (example:
https://example.com/sitemap.xml) - Click Extract URLs
- Review the extracted list
- Click Copy All URLs to copy everything
- Paste anywhere (Google Sheets, Excel, Notion, a text editor, Screaming Frog “List Mode”, etc.)
Tip: If you’re doing a content inventory, paste into a sheet and add columns for:
- Index status (Indexed / Not indexed)
- Organic traffic (from analytics)
- Last updated date
- Target keyword/topic cluster
- Redirect destination (if you’re migrating)
sbb-itb-b8bc310
How to find your sitemap URL
If you’re searching “what is my sitemap URL”, use these fast checks.
1) Try the most common sitemap locations
Replace example.com with your domain:
-
https://example.com/sitemap.xml -
https://example.com/sitemap_index.xml -
https://example.com/sitemap.xml.gz -
https://example.com/sitemap-index.xml
Many CMSs and platforms automatically publish a sitemap, often in one of the locations above.
2) Check your robots.txt (often the fastest)
Open: https://example.com/robots.txt
Look for a line like: Sitemap: https://example.com/sitemap.xml
If it’s there, that’s the URL you want.
3) Check Google Search Console (if you have access)
In Search Console, go to Sitemaps and look for submitted sitemap URLs. This is especially helpful if the sitemap lives at a non-standard path.
4) If all else fails: search operators
Try searching Google for:
-
site:example.com sitemap.xml -
site:example.com sitemap_index.xml
Sitemap vs. sitemap index (important for larger sites)
XML sitemap
A standard sitemap contains a list of URLs (pages) in <urlset>.
Sitemap index
A sitemap index is a “directory” of sitemap files (often used on large sites). It typically lists multiple sitemap URLs (for example: blog sitemap, product sitemap, category sitemap).
If your site uses a sitemap index:
Open the index URL in your browser, copy one child sitemap URL (like .../sitemap-posts.xml), and run this extractor on each child's sitemap you care about. That keeps your URL lists focused and easier to work with.
Why extracting sitemap URLs matters for SEO
Having a clean URL list from your sitemap helps you work faster and spot issues sooner:
Content inventory & audits
- Find thin or outdated pages
- Identify pages missing metadata or internal links
- Prioritize optimization work by URL group (blog, docs, categories)
Indexing & crawl diagnostics
- Compare “URLs in sitemap” vs “URLs indexed”.
- Spot patterns where Google ignores certain URL types
- Validate canonical decisions (are you listing the canonical version?)
Site migrations & redirects
- Build a “source URLs” list for your redirect map
- Catch legacy URLs still present in sitemaps
- Verify the sitemap after launch (only new URLs should remain)
Competitor or market research (public sitemaps)
- Understand how a site structures categories, collections, and content hubs
- Discover “hidden” pages that aren’t well-linked in navigation
Common problems (and quick fixes)
“Nothing extracted” or the sitemap won’t load
- Confirm the sitemap URL opens in your browser
- Make sure it’s an XML sitemap (not an HTML sitemap page)
- If the sitemap requires a login, it won’t be accessible for extraction
The sitemap is huge
Large sitemaps can take a moment to process in the browser. If it’s extremely large, look for a sitemap index and extract child sitemaps one by one.
I’m seeing URLs I didn’t expect
This is usually a sitemap configuration issue:
- Your CMS/plugin is including tag pages, internal search pages, or parameter URLs
- Your sitemap provides staging or alternate hostnames
- Old URLs remain because the sitemap cache hasn’t refreshed
Fix at the source: adjust your sitemap generator settings, then re-extract.
Best practices for what to include in your sitemap
If you want better indexing results, your sitemap should generally list:
- Canonical URLs you want indexed
- Clean, stable URLs (avoid junk parameters)
- Pages returning 200 OK (not redirects or 404s)
If you’re unsure, extract the URLs first—then sample-check:
- status codes
- canonicals
- indexability (noindex, robots, blocked resources)
FAQs
Does this tool store my sitemap data?
No. The extractor is designed to run in your browser and doesn’t store your sitemap content.
Can it automatically extract URLs from an XML sitemap?
Yes - paste the sitemap URL, click extract, then copy the results.
Can it handle sitemaps with thousands of URLs?
Most sitemaps process quickly. For very large sitemaps, extraction may take longer to display. If your site uses a sitemap index, extracting child sitemaps individually is usually faster and cleaner.
Is this the same as a crawler?
No. A crawler discovers URLs by following links. This tool lists the URLs your site declares in its XML sitemap.
Next steps after you extract your sitemap URLs
If your goal is rankings and indexing health, here are high-impact next actions:
-
Group URLs by directory (e.g.,
/blog/,/docs/,/products/) - Spot-check indexability (noindex, canonical, status code)
-
Compare extracted URLs to:
- pages receiving impressions/clicks
- pages indexed
- pages with internal links
- Update the sitemap generator settings if you find low-value or duplicate URLs