FUTIA
SEO7 min read

79,000 Doctor Profiles Programmatic SEO: Pipeline and Technical Details

How did I build the pipeline that automatically generates 79,000 doctor profiles for doktorbul.com? Data collection, template design, bulk publishing, and SEO optimization.

79,000 Doctor Profiles Programmatic SEO: Pipeline and Technical Details
Miraç Eroğlu
April 29, 2026

Adding 500 products to an e-commerce site is already tedious. But what about 79,000 doctor profiles? Each with its own URL, meta description, and structured data. You can't enter them manually, you can't bulk import from Excel, because each doctor works in a different specialty, different city, different hospital. This is where programmatic SEO comes in. I did exactly this for the doktorbul.com project: I built a pipeline that automatically generates, SEO-optimizes, and continuously updates 79,000 doctor profiles. In this article, I'll share the technical details of that pipeline, the problems I encountered, and the solutions.

Programmatic SEO is the only way to present data sets too large to manage manually to search engines. But most projects make mistakes here: they think of templates as copy-paste. However, the real challenge is ensuring each page is unique, maintaining data quality, and automating the publishing process. On doktorbul.com, there weren't just profile pages, but also city-based listing pages, specialty-based pages, and hospital pages. Over 100,000 URLs in total. If I had tried to manage this manually, the project would never have finished.

Data Source and Normalization

The foundation of every programmatic SEO project is data quality. For doktorbul.com, the data source was the Turkish Medical Association's open database and some private APIs. Raw data came like this: doctor name, national ID number (masked), specialty, institution, city, district. But the problem was: if the same doctor worked at multiple hospitals, they were duplicated in the dataset. Some specialty names were inconsistent (e.g., "Göz Hastalıkları" vs "Göz Hast."). City and district names were sometimes written in uppercase, sometimes lowercase.

The first step was data cleaning. I wrote a script in Python:

  • Converted doctor names to title case (İsmail Yılmaz, not ISMAIL YILMAZ)
  • Matched specialty names with a standard dictionary (e.g., "Göz Hast." → "Göz Hastalıkları")
  • Normalized city and district names with the Turkish Statistical Institute's official list
  • Merged multiple records of the same doctor, collected their institutions in an array

After this process, 79,000 unique doctor profiles remained. Each doctor had a unique ID (hash of the national ID number). I transferred the data to a PostgreSQL database because it could query faster than MySQL for large datasets.

Data Model and Relationships

The database schema was as follows:

  • doctors table: id, name, slug, specialty_id, bio (auto-generated), phone, email
  • specialties table: id, name, slug, description
  • cities table: id, name, slug
  • hospitals table: id, name, slug, city_id
  • doctor_hospital pivot table: doctor_id, hospital_id (many-to-many relationship)

Slug fields were important because they determined the URL structure. For example, Dr. Ahmet Yılmaz's slug would be dr-ahmet-yilmaz-kardiyoloji-ankara. Points I paid attention to when generating slugs:

  • Converting Turkish characters to ASCII (ş → s, ğ → g)
  • Replacing spaces with hyphens
  • Adding specialty and city information to the slug (for SEO)
  • Adding a number at the end if there are doctors with the same name

Template Design and Dynamic Content

The most critical point in programmatic SEO: each page must be unique, but use the same template. I used WordPress for doktorbul.com because the client would take over site management and was familiar with WordPress. But the standard WordPress editor isn't suitable for 79,000 pages. I created a custom post type: doctor_profile.

The template file single-doctor_profile.php worked like this:

1. Get the doctor slug from the URL 2. Fetch doctor information from the database 3. Generate dynamic meta title and description 4. Add Schema.org Person structured data 5. Fill the related doctors section (same specialty, same city) 6. Create breadcrumb navigation

Meta title format: Dr. [Name Surname] - [Specialty] | [City] | doktorbul.com

Meta description format: Dr. [Name Surname] provides services in the field of [Specialty] in [City]. Contact information, hospitals where they work, and appointment options.

But there was a problem here: Google could see thousands of pages generated from the same template as "thin content". Solution: generate a unique 150-200 word biography for each doctor. Writing this manually was impossible, so I used the Claude Haiku API.

Biography Automation

The prompt for each doctor was:

"Dr. [Name Surname] is a specialist working in the field of [Specialty] in [City]. They provide services at institutions such as [Hospital names]. Write a 150-word professional biography. Give general information, don't speculate."

The Claude Haiku API accepted 5 requests per second, so it took approximately 4.5 hours for 79,000 biographies. Cost: 79,000 × $0.00025 = $19.75. I saved the generated biographies to the database, so I didn't make new API requests each time a page loaded.

Were the biographies truly unique? I checked in Google Search Console after 3 months: I received no duplicate content warnings. Average page quality score was 7.2/10 (Google PageSpeed Insights).

Bulk Publishing and WordPress REST API

Adding 79,000 pages from the WordPress admin panel is impossible. Directly INSERTing into the database is also risky because WordPress has its own meta tables and relationships. Solution: WordPress REST API.

The Python script worked like this:

1. Fetch 1,000 doctor records from PostgreSQL (batch processing) 2. Send a POST request for each doctor: /wp-json/wp/v2/doctor_profile 3. In the request body: title, content (biography), slug, meta fields (phone, email, specialty ID, city ID) 4. Log the response, retry mechanism if there's an error 5. Move to the next batch after 1,000 doctors are done

I used JWT (JSON Web Token) for REST API authentication. I installed the jwt-authentication-for-wp-rest-api plugin on WordPress, sent an Authorization: Bearer [token] header with each request.

One batch of 1,000 doctors took an average of 12 minutes. Total 79 batches × 12 minutes = 948 minutes = approximately 16 hours. I ran the script at night, checked in the morning. Error rate was 0.8% (623 doctors failed), I fixed them manually.

Problems Encountered During Bulk Publishing

The first 10,000 doctors were published without issues. But then the WordPress database started slowing down. Query times increased from 200ms to 1.5 seconds. Reason: missing index on the wp_postmeta table. I was adding 8-10 meta fields for each doctor profile (phone, email, specialty ID, etc.), but WordPress didn't automatically add indexes to this table.

Solution: I manually added indexes to wp_postmeta:

CREATE INDEX idx_postmeta_post_id ON wp_postmeta(post_id);
CREATE INDEX idx_postmeta_meta_key ON wp_postmeta(meta_key);

After this operation, query times dropped back to 200ms.

Second problem: WordPress object cache. By default, WordPress caches every query, but RAM wasn't enough for 79,000 pages. I installed Redis, enabled the wp-redis plugin. Cache hit rate increased to 87%.

SEO Optimization and Structured Data

I added Schema.org Person structured data on each doctor profile page. In JSON-LD format, inside <head>:

{
 "@context": "https://schema.org",
 "@type": "Person",
 "name": "Dr. Ahmet Yılmaz",
 "jobTitle": "Kardiyoloji Uzmanı",
 "worksFor": {
 "@type": "Organization",
 "name": "Ankara Şehir Hastanesi"
 },
 "address": {
 "@type": "PostalAddress",
 "addressLocality": "Ankara",
 "addressCountry": "TR"
 },
 "url": "https://doktorbul.com/dr-ahmet-yilmaz-kardiyoloji-ankara"
}

Thanks to this structured data, Google started showing doctor profiles in rich results. For example, in a "kardiyolog ankara" search, the doctor's name, hospital where they work, and contact button appeared directly in search results.

I also specified a canonical URL for each page. Since some doctors worked in multiple cities, there could be multiple URLs for the same doctor. The canonical URL pointed to the city where the doctor primarily worked.

Internal Linking Strategy

Internal linking is very important in programmatic SEO. Each doctor profile page had 3 types of links:

1. Breadcrumb: Home > [City] > [Specialty] > Dr. [Name Surname] 2. Related doctors: 5 doctors working in the same specialty and city 3. Specialty page link: "View all [Specialty] specialists in [City]"

Thanks to these links, the site architecture was clear to Google. Orphan page rate was below 2%.

Performance and Indexing

If 79,000 pages are submitted to Google all at once, the server could crash. So I split the XML sitemap into parts. Each sitemap file had a maximum of 10,000 URLs. Total of 8 sitemap files, one sitemap index file.

I submitted the sitemap index to Google Search Console. In the first week, only 12,000 pages were indexed. Second week 34,000, third week 58,000. After 6 weeks, all 79,000 pages were indexed. To increase indexing speed:

  • I reduced server response time below 150ms (Cloudflare CDN)
  • I optimized robots.txt, blocked unnecessary URLs
  • I published 100 new pages every day (gradual, not bulk)

Result: After 6 months, doktorbul.com entered the top 10 for the "doktor ara" keyword. Monthly organic traffic reached 180,000 visits. Pages with the most traffic: cardiologists in Istanbul, pediatricians in Ankara, ophthalmologists in İzmir.

Updates and Maintenance

Programmatic SEO doesn't end when it's done, it requires continuous updates. Monthly maintenance process for doktorbul.com:

1. Add new doctors to the database (fetch from Medical Association API) 2. Remove retired or deceased doctors 3. Update hospital changes 4. Detect and fix broken links (with Screaming Frog) 5. Check page speed, optimize slow pages

I set up a cron job for these operations. It runs every Sunday night, makes changes automatically. Only critical errors are notified to me via email.

I also monitor Google Analytics and Search Console data. Which doctor profiles get the most traffic? Which keywords provide the most conversions? Based on this data, I expand the content of some profiles, add videos or images to others.

Conclusion and Contact

Publishing 79,000 doctor profiles with programmatic SEO was a technically challenging but highly valuable project. The most important lessons: data quality is more important than anything, batch processing is essential for bulk operations, WordPress REST API is a powerful tool for large projects. doktorbul.com is now one of Turkey's largest doctor search platforms.

If you're looking for support for a similar project, you can contact me. As FUTIA, I manage the entire process from database design to automatic content generation, from bulk publishing to SEO optimization. You can email info@futia.net. Or email: info@futia.net.

Frequently Asked Questions

How long does it take to publish 79,000 pages with programmatic SEO?

In the doktorbul.com project, data cleaning and normalization took 1 week, template design and testing 3 days, biography automation (Claude Haiku API) 4.5 hours, bulk publishing (WordPress REST API) 16 hours. Total project duration: 2 weeks. But indexing took 6 weeks, because Google crawls 79,000 pages gradually.

Is it necessary to generate unique content for each page?

Yes, otherwise Google marks pages as 'thin content' and drops them in rankings. On doktorbul.com, I generated a unique 150-200 word biography for each doctor. I used the Claude Haiku API, the cost was only $19.75. Alternatively, GPT-4 or manual writing can be used, but for 79,000 pages, API is the most practical solution.

Is WordPress sufficient for 79,000 pages, will there be performance issues?

The default WordPress installation is not sufficient. On doktorbul.com, I made these optimizations: manually added indexes to the wp_postmeta table, installed Redis object cache, used Cloudflare CDN, removed unnecessary plugins. After these operations, page load time dropped below 150ms. With proper configuration, WordPress is sufficient even for 100,000+ pages.

How should internal linking be done in programmatic SEO?

Each page should have at least 3 types of links: breadcrumb (hierarchy), related content (same category), category page link. On doktorbul.com, I linked 5 doctors working in the same specialty and city on each doctor profile. I also created a hierarchy with breadcrumb: Home > City > Specialty > Doctor. This way, the orphan page rate stayed below 2%.

Is Schema.org structured data really effective?

Yes, after using Schema.org Person on doktorbul.com, Google started showing doctor information in rich results. For example, in a 'kardiyolog ankara' search, the doctor's name, hospital where they work, and contact button appear directly in search results. This increased click-through rate (CTR) by 18%. In JSON-LD format, adding it inside the head is sufficient.

ABOUT THE AUTHOR
Miraç Eroğlu

Hacettepe mezunu, 6 yıldır sosyal medya, 2 yıldır AI otomasyon.

Learn more →

Want to apply one of the techniques from this post? Fill out a short form and we'll email you a free preview audit within 48 hours.