XML Sitemap Strategy: Index, Partition, and Custom Sitemap Types
A single XML sitemap isn't enough for large sites. How do you properly present your content to Google with index sitemap, partition, news, and image sitemap strategies?

Have you partitioned your sitemap because of Google's 50,000 URL limit? Or are you still trying to cram all your pages into a single XML file?
I first learned this reality in 2019 when organizing the sitemap structure for an e-commerce site with 100,000+ pages. The single sitemap file kept giving timeout errors in Google Search Console. Since then, I've tested dozens of different sitemap strategies. At FUTIA, I've set up different sitemap architectures for 79,000 doctor profiles on doktorbul.com, 618 recipe pages on italyanmutfagi.com, and daily updated listings on kamupersonelhaber.com. A different approach worked for each one.
XML sitemaps aren't just URL lists. The right partition strategy, index sitemap architecture, and custom sitemap types (news, image, video) directly affect how Google crawls your site. In this article, I'll explain which sitemap structure you should use in which situation, with examples from real cases.
XML Sitemap Fundamentals and Google's Limits
Google has two basic limits for sitemaps: maximum 50,000 URLs and maximum 50MB uncompressed file size. These numbers aren't arbitrary. Google's crawler uses a specific time window for each sitemap fetch operation. Files that are too large cause timeouts, making crawling inefficient.
Most small sites never push these limits. A single sitemap.xml file is sufficient for a 500-page corporate site. But the situation is different for e-commerce, content portals, directories, and listing sites. The doktorbul.com project had 79,000 doctor profiles. If I had put them all in a single sitemap, Google would have spent hours crawling each update.
This is where partition strategy comes in. By dividing content into logical groups, you're signaling to Google "which section updates how frequently." For example, on doktorbul.com we used city-based partitioning. Doctors in Istanbul have a separate sitemap, those in Ankara have another. Google crawls the Istanbul sitemap more frequently because the update frequency is higher.
What Is an Index Sitemap?
An index sitemap is a top layer that lists other sitemap files. In sitemap_index.xml format, it contains multiple sitemap URLs. Google first reads the index, then crawls the sub-sitemaps sequentially.
Example structure:
- sitemap_index.xml (main index)
- sitemap_doctors_istanbul.xml (15,000 URLs) - sitemap_doctors_ankara.xml (8,000 URLs) - sitemap_doctors_izmir.xml (6,000 URLs) - sitemap_blog.xml (450 URLs) - sitemap_static.xml (25 URLs)
This structure provides flexibility to Google. The blog sitemap can be updated daily, static pages monthly. Google tracks each independently, not unnecessarily re-crawling the entire site.
Partition Strategy: How to Divide Content?
Partition selection depends on your site's structure. I use three basic approaches: content type, update frequency, and URL volume.
Content Type-Based Partitioning
Different content types create different crawling needs. On an e-commerce site, product pages, category pages, blog posts, and static pages should be in separate sitemaps. We set up exactly this structure on the diolivo.com.tr project:
- sitemap_products.xml: 2,400 product pages, daily updates
- sitemap_categories.xml: 180 categories, weekly updates
- sitemap_blog.xml: 85 posts, monthly updates
- sitemap_pages.xml: 12 static pages, yearly updates
The product sitemap updates every day when new products are added. Google notices this and increases crawl frequency. But it doesn't waste resources unnecessarily on static pages.
Update Frequency-Based Partitioning
On kamupersonelhaber.com, 50+ new listings are published daily. Old listings are archived but URLs remain. Here we used time-based partitioning:
- sitemap_current_month.xml: This month's listings, daily updates
- sitemap_last_3_months.xml: Last 3 months, weekly updates
- sitemap_archive.xml: Old listings, monthly updates
Google finds current content quickly, doesn't waste unnecessary bandwidth on old content. Looking at "last crawled" dates in Search Console, the current_month sitemap is crawled daily, archive monthly.
URL Volume-Based Partitioning
On italyanmutfagi.com there are 618 recipe pages. They're all the same content type, but as we approached Google's 50,000 limit, we used alphabetical partitioning:
- sitemap_recipes_a_to_d.xml
- sitemap_recipes_e_to_k.xml
- sitemap_recipes_l_to_r.xml
- sitemap_recipes_s_to_z.xml
This structure is scalable. Even with 10,000 recipes, the same logic works, only the number of partitions increases.
News Sitemap: Special Strategy for News Sites
News sitemap is a different protocol from standard XML sitemap. It only covers content published in the last 2 days and contains tags specific to Google News. On kamupersonelhaber.com we use both standard sitemap and news sitemap.
Advantages of news sitemap:
- Fast indexing in Google News (sometimes 5-10 minutes)
- Publication date, title, keyword tags
- Geographic targeting (tr tag for Turkey)
For example, on kamupersonelhaber.com, every new listing is automatically added to the news sitemap when published. It automatically drops out after 48 hours, remaining in the standard sitemap. This way Google News bot constantly finds new content, the standard sitemap doesn't bloat.
Things to watch for with news sitemap:
- Maximum 1,000 URLs (Google's recommendation)
- Content from the last 2 days
- publication_date tag mandatory (ISO 8601 format)
- You must be registered with Google News as a news site
At FUTIA, I automatically update news sitemaps using Claude Haiku API. When new content is published, a webhook triggers, the sitemap regenerates, and a ping is sent to Google. No manual process.
Image Sitemap: Separate Strategy for Visual Content
Image sitemap is used to introduce images within pages to Google Images. It's critical especially for e-commerce and content sites. On diolivo.com.tr we used a separate image sitemap for product images, and Google Images traffic increased 180% in 6 months.
In image sitemap, multiple images can be defined for each URL:
- image:loc (image URL)
- image:caption (image description)
- image:title (image title)
- image:license (copyright information)
On italyanmutfagi.com, each recipe has an average of 4-5 images. If we had put them in the standard sitemap, the file would have bloated. By using a separate image sitemap, we both kept file size under control and provided rich information to Google Images.
Image sitemap strategy:
1. Separate image sitemap for product/content pages 2. Add descriptive captions to each image 3. Keep alt text and caption consistent 4. If images are on CDN, use CDN URLs 5. Add fallback JPG for WebP format
I create image sitemaps programmatically. If you're using WordPress, Yoast SEO or RankMath does it automatically. In custom systems, I parse HTML with Python Beautiful Soup, extract images, and write them to the sitemap.
Video Sitemap: YouTube Integration and Rich Results
Video sitemap introduces videos within pages to Google. Even if it's a YouTube embed, you should use video sitemap—your chances of appearing in rich snippets increase.
On futia.net we produced 2,000+ short videos in 3 months. We created a separate page for each video and added it to the video sitemap. Video rich snippets appear in Google searches for "artificial intelligence automation," and click-through rate increased 40%.
Video sitemap tags:
- video:title (video title)
- video:description (video description)
- video:thumbnail_loc (thumbnail URL)
- video:duration (duration in seconds)
- video:publication_date (publication date)
Even if you're using YouTube embed, add a video sitemap. Google evaluates the video on your page independently from YouTube and shows it separately in search results.
Video Sitemap Automation
At FUTIA, I've completely automated video sitemap. When a new video is uploaded:
1. Claude Haiku API generates SEO-friendly description from video title 2. I extract video duration with FFmpeg 3. I save the first frame as thumbnail 4. New entry is added to video sitemap 5. Ping is sent to Google
The entire process takes 30 seconds, no manual work. Without this automation, managing 2,000+ videos manually would have been impossible.
Index Sitemap Architecture: Blueprint for Large Sites
For large sites, index sitemap architecture should be set up as follows:
Level 1: Main Index (sitemap_index.xml)
- Lists all sub-sitemaps
- This file is defined in robots.txt
- This file is submitted to Google Search Console
Level 2: Content Type Indexes
- sitemap_products_index.xml
- sitemap_blog_index.xml
- sitemap_news_index.xml
Level 3: Partition Sitemaps
- sitemap_products_electronics.xml
- sitemap_products_clothing.xml
- sitemap_blog_2024.xml
- sitemap_blog_2023.xml
We used exactly this structure on doktorbul.com. The main index connects to 5 sub-indexes, each sub-index to 10-15 partition sitemaps. Total 79,000 URLs, 70+ sitemap files. Google crawls each partition independently, dynamically adjusting update frequency.
This architecture provides these advantages:
- You don't break the existing structure when adding new content types
- You balance load between partitions
- You use Google's crawl budget efficiently
- You can do partition-based analysis in Search Console
Sitemap Update Frequency and Ping Strategy
How often should you update your sitemap? The answer depends on your content production speed. I use three different strategies:
Real-Time Updates (News Sites) On kamupersonelhaber.com, the sitemap updates every time a new listing is published, and a ping is sent to Google. There are 50+ updates per day. Google bot notices this and increases crawl frequency. New listings are indexed within 2-3 hours.
Daily Batch Updates (E-commerce) On diolivo.com.tr, all sitemaps are regenerated once a day at 3:00 AM. Products added during the day, updated stock information, new blog posts are added to the sitemap in bulk. Single ping, efficient crawling.
Weekly Updates (Content Sites) On italyanmutfagi.com, 2-3 new recipes are added per week. Sitemap updates weekly, ping sent to Google. More frequent updates are unnecessary, Google crawls weekly anyway.
To send a ping, I use Google's official endpoint:
http://www.google.com/ping?sitemap=https://yoursite.com/sitemap.xml
You send a GET request to this endpoint, Google queues the sitemap. It's not guaranteed indexing but increases crawl chances.
Sitemap Errors and Solutions
The most common sitemap errors I encounter in Google Search Console:
1. "Sitemap could not be read" Error Usually the XML format is broken. Check using W3C XML validator. I automatically validate every sitemap after generating it, broken files don't get published.
2. "Submitted URL not found (404)" Error URL is in sitemap but not on site. On doktorbul.com we got this error a lot initially. When a doctor profile was deleted, we forgot to remove it from the sitemap. Now the deletion process triggers the sitemap, it updates automatically.
3. "Sitemap contains URLs blocked by robots.txt" Error Sitemap has URLs blocked by robots.txt. On italyanmutfagi.com we mistakenly added pages under /wp-admin/ to the sitemap. We fixed the sitemap generation script, only public URLs are added.
4. Timeout Errors Sitemap is too large, Google times out. Partition strategy is the solution. Split a 50,000 URL sitemap into 5 parts, problem solved.
5. "Parsing error" Error Turkish character encoding issue. Save as UTF-8 without BOM. When generating sitemaps in Python, I mandatorily use the encoding='utf-8' parameter.
How Does Sitemap Automation Work at FUTIA?
At FUTIA, I've automated all sitemap processes. Manual sitemap management isn't scalable, error margin is high. Automation works like this:
1. Content Change Detection Every content add/update/delete operation triggers a webhook. WordPress, custom CMS, headless system—doesn't matter, webhook is standardized.
2. Sitemap Generation Python script pulls URLs from database, groups according to partition logic, creates XML. Meta descriptions are enriched with Claude Haiku API.
3. Validation Generated XML files are automatically validated. If there's an error, notification comes to Slack, broken file doesn't get published.
4. Deployment Validated sitemaps are uploaded to CDN, cache is invalidated. New sitemap is live within 30 seconds.
5. Ping and Monitoring Ping is sent to Google, crawl status is tracked via Search Console API. Alerts come for abnormal situations (crawl drop, error increase).
Thanks to this automation, 50+ listings per day on kamupersonelhaber.com, 79,000 profiles on doktorbul.com, 618 recipes on italyanmutfagi.com are managed without issues. Zero manual work.
Testing Your Sitemap Strategy
How do you know if your sitemap strategy is working? I track three metrics:
1. Crawl Rate Google Search Console > Settings > Crawl Stats. If daily crawl count is increasing, the sitemap strategy is working. On diolivo.com.tr, daily crawling increased 65% after partition strategy.
2. Indexing Speed How many hours/days after publishing new content does it get indexed? On kamupersonelhaber.com it took 24-48 hours before using news sitemap, now 2-3 hours.
3. Sitemap Coverage Search Console > Sitemaps. "Discovered URLs" and "Indexed URLs" ratio. Should be 80%+. On doktorbul.com it's 87%, very healthy.
I track these metrics weekly, if there's an abnormal drop I review the sitemap structure.
Sitemap strategy isn't set-it-and-forget-it, it requires continuous optimization. As content volume grows and content types diversify, update your partition strategy. I review the sitemap architecture of all sites every 3 months, restructuring if necessary.
If you need support with sitemaps, as FUTIA we set up programmatic sitemap generation and automation. You can reach us at to analyze your current sitemap structure. Or send your site URL to info@futia.net, I'll do a free sitemap audit.
Frequently Asked Questions
What is the difference between an index sitemap and a normal sitemap?
An index sitemap is a top layer that lists other sitemap files. While a normal sitemap directly contains URLs, an index sitemap only contains the URLs of sitemap files. On large sites (50,000+ URLs), when you divide content into partitions, you create a separate sitemap for each partition and combine them with an index sitemap. Google first reads the index, then crawls the sub-sitemaps sequentially. This structure both allows you to use Google's crawl budget efficiently and enables you to set different update frequencies for content types.
Is it mandatory to be registered with Google News to use news sitemap?
Technically not mandatory, you can use the news sitemap format. However, if you're not registered with Google News, it's not possible for your news sitemap to be prioritized by Google News bot. Standard Googlebot will still crawl it but you lose the fast indexing advantage. If you're producing news/current content, first apply to Google News Publisher Center, use news sitemap after approval. The approval process can take 2-4 weeks. At FUTIA, I use both standard sitemap and news sitemap for kamupersonelhaber.com, it has Google News registration.
How often should you update your sitemap?
Update frequency depends on your content production speed. If you're publishing 10+ pieces of content per day, do real-time or daily updates. If you're publishing 2-3 pieces of content per week, weekly is sufficient. What's important is consistency. Google crawls regularly updated sitemaps more frequently. For example, on kamupersonelhaber.com where 50+ listings are published daily, the sitemap updates with each new content and a ping is sent to Google. On italyanmutfagi.com where 2-3 recipes are added per week, I do weekly updates. Unnecessarily frequent updates waste Google's crawl budget.
What is the difference between image sitemap and images in normal sitemap?
In a normal sitemap you can provide image information on a URL basis but it's limited. Image sitemap provides detailed metadata for each image: caption, title, license, geo_location, etc. Google Images uses this information to better categorize images and show rich snippets in search results. Image sitemap is critical especially for product images on e-commerce sites. After starting to use a separate image sitemap on diolivo.com.tr, Google Images traffic increased 180% in 6 months. We defined 4-5 images for each product with caption and title, and ranked high in image searches.
When does sitemap partition strategy become necessary?
Partition strategy is mandatory in two situations: when your URL count exceeds 50,000 or when your different content types have different update frequencies. For example, if products are updated daily on an e-commerce site while blog posts are updated monthly, use separate sitemaps. Google evaluates each partition independently, adjusting crawl frequency according to content. On doktorbul.com we used city-based partitioning for 79,000 doctor profiles. The Istanbul sitemap is crawled daily because updates are frequent, small cities are crawled weekly. This structure optimizes Google's crawl budget, it doesn't unnecessarily re-crawl the entire site.
Want to apply one of the techniques from this post? Fill out a short form and we'll email you a free preview audit within 48 hours.