XML Sitemap Strategy: Index, Partition, News and Image Sitemap Guide
A single sitemap.xml file isn't enough for large sites. How do you get Google to crawl your content faster with index sitemap, partition, news and image sitemap structures?

When a site exceeds 10,000 pages, a single sitemap.xml file makes Google's job harder. I experienced this firsthand at FUTIA when creating 618 recipe pages for italyanmutfagi.com: listing all URLs in a single sitemap file both inflated the file size and made it impossible to track which content was crawled when in Google Search Console. Index sitemap, partition, news sitemap and image sitemap structures exist to solve this problem. In this article, I'll explain step by step how to build a scalable XML sitemap architecture for sites with 50,000+ pages, which content type should go into which sitemap type, and how to optimize Google's crawl budget.
Why isn't a single sitemap.xml enough?
Google's sitemap.xml specification theoretically sets a limit of 50,000 URLs and 50 MB. But in practice, problems start much earlier. When I was building the sitemap strategy for doktorbul.com with 79,000 doctor profiles, in the first attempt when I put all URLs into a single file, the "Discovered - currently not indexed" status in the Google Search Console Coverage report reached 60%. Why? Because Google has to parse the entire file in every crawl session and can't prioritize.
A single sitemap has three major problems:
1. Crawl inefficiency: Google reads a 40,000 URL file from start to finish every time. Even for 10 newly added pages, it parses the entire file. 2. Tracking difficulty: You see the "Last Read" date in Search Console, but you can't distinguish which URLs were crawled and which were skipped. 3. No category prioritization: Blog posts, product pages, category pages are all in the same file. Which should Google prioritize?
To solve this problem on italyanmutfagi.com, I split the sitemap by content type: recipes.xml, categories.xml, blog.xml. Result? Google crawls new recipes in an average of 4 hours because the recipes.xml file contains only 618 URLs and is updated daily. A separate file for blog posts, with weekly update frequency, at lower priority.
What is an index sitemap and how do you set it up?
An index sitemap is a top layer that lists other sitemap files. Its structure is simple:
<?xml version="1.0" encoding="UTF-8"?>
<sitemapindex xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
<sitemap>
<loc>https://futia.io/sitemap-posts.xml</loc>
<lastmod>2025-01-15T08:30:00+00:00</lastmod>
</sitemap>
<sitemap>
<loc>https://futia.io/sitemap-pages.xml</loc>
<lastmod>2025-01-10T12:00:00+00:00</lastmod>
</sitemap>
<sitemap>
<loc>https://futia.io/sitemap-images.xml</loc>
<lastmod>2025-01-14T18:45:00+00:00</lastmod>
</sitemap>
</sitemapindex>
You define this file in robots.txt:
Sitemap: https://futia.io/sitemap-index.xml
When Google reads the index sitemap, it discovers all the sub-sitemaps within it and crawls each one independently. At FUTIA, I used this structure for kamupersonelhaber.com:
- sitemap-index.xml (main)
- sitemap-announcements-2024.xml (annual announcements) - sitemap-announcements-2025.xml (current announcements) - sitemap-news.xml (last 2 days, news sitemap format) - sitemap-categories.xml (static pages)
Thanks to this structure, the daily 50+ announcements pulled from the ilan.gov.tr API go into sitemap-announcements-2025.xml and Google checks this file every day. 2024 announcements are crawled once a week because the lastmod date is old.
The lastmod date in index sitemap is critical
Google understands which sub-sitemap has been updated by looking at the lastmod date in the index sitemap. I tested this on italyanmutfagi.com: I updated the lastmod date of recipes.xml every time a new recipe was added, kept categories.xml constant. Result: Google crawls recipes.xml 3-4 times a day, categories.xml once a week.
But be careful: don't fake update the lastmod date. Google checks the file's content. If lastmod is new but URLs are the same, it reduces crawl frequency. I saw this on diolivo.com.tr: I updated the cart recovery pages sitemap every day but URLs didn't change, Google reduced crawl frequency after 3 weeks.
Partition sitemap: splitting large content sets
Partition is a strategy of splitting the same content type into multiple files. For example, if you have 50,000 product pages, you split them into 10 different sitemap files:
- sitemap-products-1.xml (1-5,000)
- sitemap-products-2.xml (5,001-10,000)
- ...
- sitemap-products-10.xml (45,001-50,000)
On doktorbul.com, I split 79,000 doctor profiles by city:
- sitemap-doctors-istanbul.xml (18,400 URLs)
- sitemap-doctors-ankara.xml (9,200 URLs)
- sitemap-doctors-izmir.xml (6,800 URLs)
- sitemap-doctors-other.xml (44,600 URLs, split into 10 alphabetical files)
This structure has two advantages:
1. Geographic crawling: Google runs the bot crawling doctors in Istanbul and the one crawling Ankara at different times. Result: parallel crawling, faster indexing. 2. Error isolation: If there's a problem in the Istanbul sitemap (e.g., URLs returning 404), crawling of other cities isn't affected.
In partition strategy, keeping file size between 10-20 MB is ideal. Google has set a 50 MB limit but parses files smaller than 10 MB faster. On italyanmutfagi.com, I used a single file for 618 recipes (file size 42 KB), but on memuratamalari.com, I used 8 different partitions for 40,400 pages.
Date-based partition: for time series content
For news sites, blog-heavy sites, date-based partition makes sense:
- sitemap-posts-2025-01.xml
- sitemap-posts-2024-12.xml
- sitemap-posts-2024-11.xml
This structure ensures fast crawling of new content. On kamupersonelhaber.com, I put 2025 announcements in a separate file, 2024 and earlier in archive partitions. Google crawls the 2025 file daily, archive files monthly.
Note: keep the number of partitions below 50. Google supports more than 50,000 sub-sitemaps in the index sitemap, but in practice when there are more than 50 files, crawl ordering gets confused. If you have 100,000+ pages, first split by content type (products, categories, blog), then partition each type within itself.
News sitemap: special format for news content
To get into Google News or to get your news content indexed quickly, you should use a news sitemap. Differences from a normal sitemap:
1. Content published within the last 2 days 2. Special XML namespace and tags 3. Metadata like publication name, publication language, article title
Example news sitemap:
<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"
xmlns:news="http://www.google.com/schemas/sitemap-news/0.9">
<url>
<loc>https://kamupersonelhaber.com/2025-subat-ogretmen-atamalari</loc>
<news:news>
<news:publication>
<news:name>Kamu Personel Haber</news:name>
<news:language>tr</news:language>
</news:publication>
<news:publication_date>2025-01-15T09:30:00+03:00</news:publication_date>
<news:title>2025 Şubat Öğretmen Atamaları Başvuru Tarihleri Açıklandı</news:title>
</news:news>
</url>
</urlset>
I use news sitemap on kamupersonelhaber.com because 50+ announcements are published daily and users search Google for things like "civil servant appointments announced today". Thanks to the news sitemap, a new announcement appears on Google News 2-3 hours after publication.
Things to watch out for with news sitemap:
- Only content from the last 2 days should be included. If you add 3-day-old content, Google gives a warning.
- publication_date in ISO 8601 format, with timezone.
- Use the page title in the news:title tag, not the meta title.
- Maximum 1,000 URLs. If you publish more than 1,000 news items daily, do hourly partitioning.
At FUTIA, I don't use news sitemap for futia.net because video content doesn't fit the news format. But using it on kamupersonelhaber.com is critical for organic traffic: traffic from Google News increased 18% in the last 3 months.
Image sitemap: separate structure for visual content
If your site has a lot of images (e-commerce, recipe sites, gallery sites), you should use an image sitemap. You can add image tags to a normal sitemap:
<url>
<loc>https://italyanmutfagi.com/tiramisu-tarifi</loc>
<image:image>
<image:loc>https://italyanmutfagi.com/images/tiramisu-hero.jpg</image:loc>
<image:title>Tiramisu Tarifi</image:title>
<image:caption>Klasik İtalyan tiramisusu, mascarpone peyniri ve ladyfinger ile</image:caption>
</image:image>
<image:image>
<image:loc>https://italyanmutfagi.com/images/tiramisu-step1.jpg</image:loc>
<image:title>Tiramisu yapım aşaması 1</image:title>
</image:image>
</url>
But on italyanmutfagi.com, I used a separate image sitemap because each recipe page has an average of 6-8 images. 618 recipes x 7 images = 4,326 images. If I added them to recipes.xml, the file size would reach 2.1 MB. I created a separate sitemap-images.xml:
- File size: 890 KB
- Indexing rate on Google Image Search: 87% (was 62% before separate sitemap)
- Traffic from images: 54% increase in 3 months
Tips for image sitemap:
1. You can add a maximum of 1,000 images per URL. If a page has more than 1,000 images (gallery sites), do pagination. 2. Use the image:caption tag. Google uses this text in image searches. 3. Use the CDN URL of images, not the origin server. For example, on italyanmutfagi.com, images are on Cloudflare CDN, the sitemap has CDN URLs. 4. Also add images in WebP format. Google indexes WebP.
On diolivo.com.tr, I didn't use an image sitemap for product images because Shopify automatically adds image tags to the product sitemap. But I created a separate sitemap for images I used on custom landing pages and the indexing rate in the "Enhancements > Images" report in Google Search Console rose to 91%.
Sitemap update frequency and the truth about the changefreq tag
The XML sitemap specification has a changefreq tag:
<url>
<loc>https://futia.io/blog/xml-sitemap</loc>
<changefreq>weekly</changefreq>
<priority>0.8</priority>
</url>
But Google has been ignoring changefreq and priority tags since 2019. John Mueller said explicitly on the Google Search Central podcast: "We ignore changefreq and priority. We determine crawl frequency based on actual content changes."
I tested this on doktorbul.com: I added changefreq="daily" to some doctor profiles, "monthly" to others. After 3 months, no difference in crawl frequency. Google checks whether the page actually changed.
So what should you do?
1. Update the lastmod date of the sitemap file when any URL in it changes. 2. Sync the lastmod date for each URL to that page's last update date. 3. Don't use changefreq and priority tags, they inflate file size.
On italyanmutfagi.com, I automatically update recipes using the Claude Haiku API (nutritional values, alternative ingredients). After each update, I change that recipe's lastmod date and regenerate the sitemap. Result: Google re-crawls updated recipes in an average of 6 hours.
Sitemap ping: manual notification to Google
After updating your sitemap file, you can manually ping Google:
GET http://www.google.com/ping?sitemap=https://futia.io/sitemap-index.xml
I do this on kamupersonelhaber.com every time a new announcement is added. Python code:
import requests
def ping_google(sitemap_url):
ping_url = f"http://www.google.com/ping?sitemap={sitemap_url}"
response = requests.get(ping_url)
return response.status_code == 200
But be careful: if you ping more than 10 times a day, Google may apply rate limiting. I set up an hourly cron job that pings if the sitemap changed in the last hour.
Sitemap errors and Search Console reports
The Sitemaps section in Google Search Console shows all errors related to your sitemap. Most common errors:
1. "Couldn't fetch": Sitemap file returns 404 or server times out. I experienced this on doktorbul.com: sitemap files were dynamically generated, when server load increased, timeouts started. Solution: I cached sitemaps as static files.
2. "Sitemap is an HTML page": Common in WordPress. The sitemap URL redirects with a 301 to another page and that page returns HTML. Solution: check the sitemap URL in robots.txt, there should be no redirect.
3. "Parsing error": XML format is incorrect. Most common cause: unescaped characters in URLs (&, <, >). I experienced this on italyanmutfagi.com: recipe URLs had the "&" character (e.g., "makarna-&-sos"). Solution: XML encode URLs (&, <, >).
4. "URL not allowed": The sitemap has URLs that are disallowed in robots.txt. I did this on kamupersonelhaber.com: I accidentally added admin pages to the sitemap. Solution: check robots.txt rules in the sitemap generation script.
Regularly check the "Coverage" report in Search Console. URLs in "Discovered - currently not indexed" status are pages that haven't been crawled despite being in the sitemap. If this rate exceeds 20%, review your sitemap strategy. On doktorbul.com, I reduced this rate from 60% to 8%: with index sitemap + partition + lastmod optimization.
How does sitemap automation work at FUTIA?
At FUTIA, I automatically generate sitemaps for all client projects. The process:
1. Content types in the database are analyzed (posts, pages, custom post types) 2. A separate sitemap file is created for each content type 3. If there are 10,000+ URLs, partition strategy kicks in 4. Sitemap files are stored statically on Cloudflare R2 (CDN speed) 5. When content is updated, the relevant sitemap file is regenerated and Google is pinged
For example, on italyanmutfagi.com, when a new recipe is added:
- Recipe content is generated with Claude Haiku API
- Recipe is saved to database
- recipes.xml file is recreated (618 URLs)
- The lastmod date of recipes.xml in sitemap-index.xml is updated
- Google is pinged
The entire process takes 4-6 seconds. I built this with Python + FastAPI, but if you're using WordPress, Yoast SEO or RankMath plugins do similar functions (with less flexibility).
If your site is growing and your sitemap strategy is insufficient, you can contact me. WhatsApp: +90 532 491 17 05 or info@futia.net. At FUTIA, we build sitemap architecture especially for programmatic SEO projects: index + partition + news sitemap structures for sites with 50,000+ pages, automatic updates, Google Search Console integration. I work from the Netherlands but provide Turkish support to Turkish brands.
Frequently Asked Questions
What is the difference between an index sitemap and a normal sitemap?
A normal sitemap lists URLs directly. An index sitemap is a top layer that lists other sitemap files. For example, sitemap-index.xml contains sub-sitemaps like sitemap-posts.xml, sitemap-pages.xml, sitemap-images.xml. When Google reads the index sitemap, it discovers all the sub-sitemaps within it and crawls each one independently. If you have more than 10,000 pages, you must use an index sitemap because a single sitemap file is subject to a 50,000 URL and 50 MB limit. Thanks to index sitemap, you can separate content types and optimize crawl priorities.
How is a news sitemap different from a normal sitemap?
A news sitemap is a special XML format for Google News and only covers content published within the last 2 days. Differences from a normal sitemap: special XML namespace (xmlns:news), publication name and language tags, publication_date tag (in ISO 8601 format with timezone), and a maximum limit of 1,000 URLs. Google can crawl content in the news sitemap and publish it on Google News within 2-3 hours. It's a critical SEO tool for news sites and blogs producing current content. I use it on kamupersonelhaber.com and daily announcements appear on Google News within 2-3 hours.
Should I use changefreq and priority tags in the sitemap?
No. Google has been ignoring changefreq and priority tags since 2019. John Mueller stated clearly: Google determines crawl frequency based on actual content changes, not the changefreq tag in the sitemap. Using these tags only inflates file size. Instead, use the lastmod (last modified date) tag correctly for each URL. Google understands which pages have been updated by looking at the lastmod date and adjusts crawl priority accordingly. I don't use changefreq and priority tags at all in FUTIA projects.
How many sitemap files should there be for large sites?
There's no single rule but fewer than 50 sitemap files is ideal. Google supports up to 50,000 sub-sitemaps in the index sitemap, but in practice when there are more than 50 files, crawl ordering gets confused. Strategy: first split by content type (blog posts, products, categories, images), then partition each type within itself. For example, if you have 100,000 product pages: sitemap-products-1.xml (1-10,000), sitemap-products-2.xml (10,001-20,000), etc., 10 partitions. Each file should be between 10-20 MB. On doktorbul.com, I use 15 sitemap files for 79,000 URLs: city-based partition + content type separation.
When is an image sitemap necessary?
If your site has more than 3 images per page or more than 1,000 images total, you should use an image sitemap. You can add image tags to a normal sitemap, but a separate image sitemap increases indexing rate on Google Image Search. On italyanmutfagi.com, I used a separate sitemap for 618 recipes x 7 images = 4,326 images and traffic from images increased 54% in 3 months. In the image sitemap, use loc, title, caption tags for each image. Prefer CDN URLs, also add images in WebP format. It's a critical SEO tool for e-commerce, recipe, gallery sites.
Want to apply one of the techniques from this post? Fill out a short form and we'll email you a free preview audit within 48 hours.