FUTIA
SEO8 min read

79,000 Doctor Profiles Programmatic SEO: Pipeline and Technical Details

How we created 79,000 doctor profiles with programmatic SEO for doktorbul.com? WordPress, Python, and OpenAI API pipeline details.

79,000 Doctor Profiles Programmatic SEO: Pipeline and Technical Details
Miraç Eroğlu
May 4, 2026

How many years would it take to manually write 79,000 doctor profiles on a healthcare platform? If you spend an average of 15 minutes, that's 19,750 hours, or 2,468 working days. 10 years. For doktorbul.com, we solved this with a 6-week pipeline. One person, Python, WordPress REST API, and OpenAI. In this article, I'll share every step of that pipeline, the technical challenges I faced, and the results. My goal isn't to give you a theoretical guide, but to show you how programmatic SEO is applied in a real project.

Programmatic SEO is based on the trinity of data source + template + automation. In the doktorbul.com case, the data source is the TKHK (Turkish Ministry of Health) doctor database, the template is a WordPress custom post type, and the automation part is a Python script and OpenAI API. At the beginning of the project, I made 3 critical decisions: content quality, technical architecture, and indexing strategy. These decisions affected every stage of the pipeline.

Data Source and Normalization Process

The raw data we received from TKHK was a CSV file with 79,000 rows. Each row contained doctor name, specialty, city, and hospital information. But the data quality was at disaster level. Specialty names were written in 47 different variations ("Çocuk Sağlığı ve Hastalıkları", "Pediatri", "Çocuk Doktoru"), city names were a mess of uppercase and lowercase, hospital information was missing or incorrect.

The first step was normalization. I performed the following operations with Python Pandas:

  • Reduced specialty names to 32 standard categories (with mapping dictionary)
  • Converted city names to title case, fixed Turkish character issues
  • Filtered empty or erroneous rows (6,200 rows eliminated)
  • Duplicate check (if the same doctor works at multiple hospitals, kept the most recent record)

After normalization, 72,800 clean rows remained. I created a unique URL slug for each row: "dr-mehmet-yilmaz-istanbul-kardiyoloji". This slug is both SEO-friendly and has low collision risk.

Data Enrichment

The raw data only contained basic information. Not enough for SEO. I created the following extra fields for each profile:

  • Meta description (120-150 characters, city + specialty + doctor name)
  • FAQ section (3 specialty-specific questions and answers)
  • Related specialties list (for internal linking)
  • Schema.org Physician markup data

I did this enrichment process with the OpenAI API. But using GPT-4 for 72,800 profiles would create a cost explosion. Solution: batch processing with GPT-3.5-turbo. Each batch has 100 profiles, 10 batches are processed simultaneously with parallel requests. Total cost $340, average processing time 18 hours.

WordPress Technical Architecture

Doktorbul.com runs on WordPress. I created a custom post type for programmatic SEO: "doctor_profile". This post type's special fields:

  • Doctor name (post title)
  • Specialty (taxonomy: "medical_specialty")
  • City (taxonomy: "city")
  • Hospital (custom field)
  • Years of experience (custom field)
  • Education information (custom field)
  • FAQ (repeater field, with ACF)

Taxonomy structure is critical. I created 32 terms for "medical_specialty", 81 terms for "city". These taxonomies automatically generate archive pages: "/kardiyoloji-doktorlari/", "/istanbul-doktorlari/". Each archive page is a separate SEO entity.

I did bulk content upload via WordPress REST API. The standard wp-json endpoint wasn't sufficient, I wrote a custom endpoint. Why? Because I needed to send ACF fields, taxonomy terms, and meta data in a single request. The standard endpoint requires these in separate requests, which would create a time explosion for 72,800 profiles.

Custom endpoint code (PHP):

register_rest_route('futia/v1', '/bulk-doctor', array(
  'methods' => 'POST',
  'callback' => 'futia_bulk_insert_doctor',
  'permission_callback' => 'futia_check_api_key'
));

This endpoint does the following in a single request:

  • Create post
  • Assign taxonomy terms
  • Fill ACF fields
  • Add schema markup
  • Update Yoast SEO meta data

Average response time 280ms. Took a total of 5.6 hours for 72,800 profiles.

Content Production Pipeline

In programmatic SEO, content quality is everything. To avoid Google's "thin content" detection, each profile must be unique and valuable. My pipeline consists of 3 layers:

Layer 1: Template-Based Content

There's a fixed structure for each profile:

  • Introduction paragraph (doctor name + specialty + city)
  • General information about the specialty (150 words)
  • Doctor's areas of expertise (list)
  • Hospital information
  • Appointment process

This layer is entirely template-based. Filled with Python string formatting. Zero cost, maximum speed.

Layer 2: AI Enrichment

I'm adding unique sections with OpenAI on top of the template content:

  • 3 specialty-specific FAQs
  • "Why should you choose this doctor?" section
  • Related health advice (150 words)

Prompt example:

"You are a health content writer. Dr. {name} is a {specialty} specialist in {city}. Write 3 frequently asked questions and answers for this doctor. Each answer should be 60-80 words. Questions should be the kind actually asked by patients."

GPT-3.5-turbo responds to this prompt in an average of 12 seconds. With batch processing, when 100 profiles are processed in parallel, the time drops to 2 minutes.

Layer 3: Internal Linking

Each profile has 5-8 internal links:

  • Other doctors in the same specialty (3 links)
  • Other doctors in the same city (2 links)
  • Related specialties (2 links)
  • Main category page (1 link)

These links are automatically created by Python script. The algorithm is simple: compare profile data, find the closest matches, add links. Result: 400,000+ internal links among 72,800 profiles. A great site structure for Google.

Indexing and Performance Optimization

Submitting 72,800 pages to Google at once is suicide. The indexing budget explodes, the site becomes uncrawlable. My strategy is phased indexing:

Phase 1: Priority Pages (First Week)

  • Istanbul, Ankara, Izmir doctors (18,000 profiles)
  • Popular specialties: Cardiology, Orthopedics, Ophthalmology (12,000 profiles)
  • Taxonomy archive pages (113 pages)

I created an XML sitemap for these pages and manually submitted it to Google Search Console. Crawl priority was marked as "high".

Phase 2: Secondary Pages (Weeks 2-4)

The remaining 42,800 profiles were added to the sitemap at 2,000 pages per day over 3 weeks. I monitored Google's crawl rate (Search Console > Settings > Crawl Stats), when server load exceeded 70%, I stopped adding new pages.

Performance Metrics

WordPress + 72,800 posts = potential performance disaster. Solutions:

  • Redis object cache (post queries are cached)
  • Cloudflare CDN (static assets at the edge)
  • Lazy loading (images load on scroll)
  • Database index optimization (custom indexes on wp_postmeta table)

Result: average page load time 1.8 seconds. 2.3 seconds on mobile. Core Web Vitals completely green.

Real Results and Traffic Growth

The project took 6 weeks. Organic traffic developed as follows over the first 3 months:

  • Month 1: 4,200 monthly visits
  • Month 2: 18,600 monthly visits
  • Month 3: 41,000 monthly visits

By month 6, we reached 79,000 monthly organic visits. Average CTR 3.2%, average position 8.4. Pages bringing the most traffic:

  • "istanbul kardiyoloji doktorları" (2,100 monthly)
  • "ankara göz doktoru" (1,800 monthly)
  • "izmir ortopedi uzmanı" (1,400 monthly)

Individual doctor profiles also attract traffic. A page like "dr-ahmet-kaya-istanbul-kardiyoloji" gets 40-60 visits per month. 72,800 pages × 50 visits = 3.6 million potential monthly visits. Of course, not all are indexed, but the potential is there.

Technical Challenges I Faced

Programmatic SEO is simple in theory, chaotic in practice. What happened to me:

Problem 1: WordPress Memory Limit

While loading 72,800 posts, WordPress hit the memory limit. The standard 256MB isn't enough. Solution: increased memory_limit to 512MB in php.ini, reduced batch size from 100 to 50.

Problem 2: OpenAI Rate Limit

Rate limit for GPT-3.5-turbo is 3,500 requests per minute. Parallel batch processing exceeds this limit. Solution: exponential backoff algorithm. If request fails, wait 2 seconds, try again. If still fails, wait 4 seconds. Maximum 5 attempts.

Problem 3: Duplicate Content Risk

Content for the same specialty + city combination comes out very similar. For example, there are 240 cardiologists in Istanbul, their content is almost identical. Solution: added a unique "snippet" for each profile. Variables like the doctor's years of experience, university where they studied, areas of expertise differentiate the content.

Problem 4: Taxonomy Pagination

The "istanbul-doktorlari" archive page contains 18,000 doctors. The page can't load without pagination. WordPress default pagination is slow (full query for each page). Solution: custom pagination query, only post IDs are fetched, then post data is retrieved in batches. Page load time dropped from 8 seconds to 2.1 seconds.

Tool Selection for Programmatic SEO

I used WordPress for doktorbul.com. But it's not the right tool for every project. Alternatives:

  • Next.js + Headless CMS: Ideal for 100,000+ pages, generates static HTML with SSG, maximum speed. But setup is complex.
  • Django + PostgreSQL: Powerful for data-intensive projects, admin panel ready. But frontend development is extra work.
  • WordPress + Custom Plugin: Balanced for medium scale (10,000-100,000 pages). Wide ecosystem, easy hosting.

I chose WordPress for these reasons:

  • Client knows WordPress (can make their own updates)
  • SEO plugins ready (Yoast, RankMath)
  • Hosting cheap (even shared hosting handles 50,000 pages)
  • Custom field management easy with tools like ACF

If your project contains 500,000+ pages, I recommend the Next.js + Vercel + headless CMS combination. At doktorbul.com scale, WordPress is sufficient.

Cost Analysis

Is programmatic SEO cheap? Total cost for doktorbul.com:

  • OpenAI API: $340 (GPT-3.5-turbo for 72,800 profiles)
  • Hosting: $25/month (Cloudways, 4GB RAM, 80GB SSD)
  • Domain + SSL: $15/year
  • Development time: 6 weeks (cost varies for freelance or in-house)

Compare with manual content production: 72,800 profiles × 15 minutes = 18,200 hours. Assuming $50 per hour writing fee, total $910,000. With programmatic SEO, this cost dropped to $340. ROI is clear.

But attention: this cost is only for initial production. Monthly updates are needed for sustainability. For doktorbul.com, 500-800 new doctors are added every month, 200-300 profiles are updated. For this process, the Python script runs once a week, cost is $15-20 per month.

Situations Where You Shouldn't Do Programmatic SEO

Not every project is suitable for programmatic SEO. Don't do it in these situations:

  • Your data source is low quality or not current
  • Niche is too narrow (less than 500 pages can be generated in total)
  • Competition is too high (programmatic content can't reach top positions)
  • Brand-focused content is needed (programmatic content remains generic)

Doktorbul.com is an ideal case because:

  • Data source is official (TKHK)
  • Niche is wide (79,000 doctors)
  • Competition is medium level (local searches)
  • Information-focused content (no brand story needed)

If your project meets these criteria, programmatic SEO works.

The doktorbul.com case was a turning point for me. We brought 79,000 pages to life with a 6-week pipeline, reached 79,000 monthly organic visits by month 6. I shared the technical details, challenges I faced, and solutions in this article. If you have a similar project or want to talk about programmatic SEO, you can write via WhatsApp: +90 532 491 17 05. Or if you prefer email: info@futia.net. As FUTIA, we provide site + automation + monthly maintenance services to Turkish brands from the Netherlands.

Frequently Asked Questions

What data sources can be used for programmatic SEO?

The most reliable data sources are official databases (government agencies, APIs), industry reports, and licensed datasets. We used the TKHK database for doktorbul.com. Alternatively, product catalogs can be used for e-commerce sites, listing databases for real estate sites, position lists for job posting sites. What matters is data quality and currency. Low-quality data produces low-quality content and is penalized by Google.

Is WordPress sufficient for 100,000+ pages?

WordPress can handle 100,000 pages if properly optimized, but performance optimization is critical. Redis object cache, CDN, database index optimization, and lazy loading are necessary. Above 100,000, static site generators like Next.js are more performant. We used WordPress with 72,800 pages on doktorbul.com, average load time 1.8 seconds. Hosting quality is also important, VPS or cloud hosting should be preferred over shared hosting.

Is content produced with programmatic SEO considered duplicate content?

If each page contains unique data, it's not considered duplicate content. Template-based content is not a problem, what matters is that each page contains different variables (name, city, specialty). On doktorbul.com, each profile contains a unique doctor name, city, and specialty combination. We also add FAQ and description sections with OpenAI, which further differentiates the content. Google's duplicate content detection focuses on the same content being repeated on different URLs, variable-based content is exempt from this.

How long does programmatic SEO take to show results?

First results are seen within 4-8 weeks, but reaching full potential takes 6-12 months. On doktorbul.com, we achieved 4,200 in month 1, 41,000 in month 3, and 79,000 monthly visits in month 6. Speed depends on niche competition, site authority, and indexing strategy. It takes longer in highly competitive niches. Phased indexing (1,000-2,000 pages per day) optimizes Google's crawl budget and delivers faster results.

Which AI model should be used for programmatic SEO?

GPT-3.5-turbo is ideal for cost-quality balance. We spent $340 for 72,800 profiles on doktorbul.com. GPT-4 is higher quality but 10-15 times more expensive, not sustainable for large projects. Claude Haiku or Gemini Flash are also alternatives, with speed and cost advantages. What matters is prompt quality, whatever model you use, a good prompt produces unique content. You can optimize processing time with batch processing and parallel requests.

ABOUT THE AUTHOR
Miraç Eroğlu

Hacettepe mezunu, 6 yıldır sosyal medya, 2 yıldır AI otomasyon.

Learn more →

Want to apply one of the techniques from this post? Fill out a short form and we'll email you a free preview audit within 48 hours.