HTTP redirects and AI crawlers in 2026: what GPTBot, ClaudeBot and PerplexityBot really see on your site

Q: Should you allow AI crawlers on your site?

The decision depends on editorial strategy. Allowing GPTBot and ClaudeBot feeds the training of future models with your content, with no directly measurable benefit. Allowing OAI-SearchBot , Claude-SearchBot and PerplexityBot makes you citable in real time inside ChatGPT Search, Claude Search and Perplexity Answers, which generates referrer traffic. For an editorial site seeking AI visibility, allow real-time search crawlers and decide on training crawlers based on your stance on intellectual property.

Q: Do GPTBot, ClaudeBot and PerplexityBot follow 301 and 302 redirects?

Yes, the three crawlers follow codes 301 , 302 , 307 and 308 as RFC 9110 prescribes. Differences lie in the number of hops tolerated (3 to 5 depending on the bot), in target memorization ( 301 and 308 are stored, 302 and 307 trigger regular recrawls), and in cookie handling (none are forwarded during a redirect).

Q: How many hops does an AI crawler tolerate before giving up?

In practice observed in 2026: GPTBot , ClaudeBot and PerplexityBot tolerate up to 5 hops for training crawl. Real-time search crawlers ( OAI-SearchBot , Claude-SearchBot , Perplexity-User ) give up at the 3rd hop. For comparison, Googlebot tolerates 10 hops and RFC 9110 allows up to 30. To stay citable across all AI surfaces, the target should be 1 to 2 hops maximum.

By CaptainDNS
Published on May 11, 2026

Comparative diagram of how GPTBot, ClaudeBot and PerplexityBot behave on an HTTP redirect chain 301, 302 and 308

TL;DR

Three distinct user-agents (GPTBot, ClaudeBot, PerplexityBot), three different behaviors on HTTP redirects
Chain tolerance: 3 to 5 hops observed in practice, 30 in the RFC, 10 on the Googlebot side
llms.txt is not indexed behind a misconfigured 301 pointing to another path
To stay visible in AI Overviews and Perplexity Answers, your final URL must return 200 OK in 1 to 3 hops maximum
Audit the actual behavior of each bot with a redirect checker that simulates AI user-agents

According to Search Engine Journal, AI crawlers generated more than 68 million visits on websites in 2025, up 250 % year over year. Yet most sites optimized for Googlebot have never been tested against GPTBot, ClaudeBot or PerplexityBot. A redirect chain that passes for Google can wipe an entire page from ChatGPT, Claude or Perplexity answers.

The issue is not HTTP itself: redirects 301, 302, 307 and 308 have been standardized for twenty years in RFC 9110. The issue lies in each bot's mission and in the internal limits each publisher has set. OpenAI, Anthropic and Perplexity partially document their crawlers, but none of them publicly states the exact hop count they tolerate or how they negotiate a chain like 301 → 302 → 308 → 200.

This article documents the actual behavior of the three major AI crawlers in 2026, provides a comparative table of user-agents and official IP ranges, explains the interaction between llms.txt and redirects, and proposes five optimization best practices oriented toward generative engine optimization (GEO) and answer engine optimization (AEO).

Target audience: SEO teams shifting from Google-only to a multi-AI ecosystem, system administrators responsible for domain migrations, technical leadership at SMBs that want to preserve their visibility in AI-generated answers.

Before diving into the technical details, a field observation: sites that do not optimize for AI crawlers in 2026 do not lose visibility overnight. The degradation is progressive and silent. Misconfigured pages disappear one by one from generated answers, without an error signal, without a publisher-side alert. Standard analytics tools (Google Analytics, Search Console) do not reflect this loss because AI visits do not always generate a traceable referrer click. Only server log analysis, cross-referenced with regular redirect chain audits, allows you to measure the actual health of a site's AI indexing.

Check what AI crawlers really see

Test with URL Redirect Checker

Why an AI crawler is not a classic search engine crawler?

The term "AI crawler" covers three families of agents that have neither the same mission, nor the same technical constraints, nor the same HTTP behavior. Conflating them leads to misconfigured redirects and significant lost visibility in LLM-generated answers.

Three families, three goals

The first family is the training crawlers. GPTBot from OpenAI, ClaudeBot from Anthropic and anthropic-ai feed the training datasets of foundational models. They crawl in bulk, without urgency, and strictly respect robots.txt. A page blocked for them will never be used to train GPT-5 or Claude 4.

The second family is the real-time search crawlers. OAI-SearchBot (used by ChatGPT Search), PerplexityBot and Claude-SearchBot index continuously to answer user queries live. Their tolerance for slow pages or redirect chains is lower: one extra hop and the page is not shown in the generated answer.

The third family is the conversational agents. ChatGPT-User and Claude-User are triggered when a user explicitly asks the assistant to visit a URL. They behave more like a browser, with less strict robots.txt enforcement and more tolerance for short chains.

Consequence for redirects

Blocking or redirecting a training crawler does not have the same effect as blocking a search crawler. If you block GPTBot but allow OAI-SearchBot, your content does not train GPT-5 but remains citable in real time inside ChatGPT Search. Conversely, a chain 301 → 302 → 200 that GPTBot digests without issue can cause OAI-SearchBot to give up, leaving the user with an answer that does not cite your page.

Diagram of the three AI crawler families and their behavior on HTTP redirects

RFC 9110 and 9112 define the HTTP semantics that all bots, AI or not, are expected to follow. In practice, each publisher applies its own internal limits to optimize crawl budget.

The 2023-2026 context: emergence and fragmentation

GPTBot was announced by OpenAI on August 7, 2023. Before that date, training of GPT-3 and GPT-4 models relied on Common Crawl, a dataset aggregated by a third-party foundation. Building a proprietary crawler marked a turning point: LLM publishers wanted to control their training source and let web publishers signal their preferences explicitly through robots.txt.

ClaudeBot followed in March 2024, alongside anthropic-ai for older internal use cases. PerplexityBot had existed since 2022 but stood out in 2024-2025 by publicly publishing its IP ranges and aligning with OpenAI and Anthropic's transparency standards.

In late 2024, the three publishers introduced distinct crawlers for real-time search: OAI-SearchBot, Claude-SearchBot. This separation is crucial for web publishers: allowing or blocking can no longer be done in a binary way. Each crawler has its own robots.txt token, user-agent and HTTP behavior.

Comparative table of user-agents in 2026

Before configuring a redirect strategy oriented toward AI, you need to know precisely the user-agents knocking on your site's door. The table below summarizes the official information published by OpenAI, Anthropic and Perplexity as of late April 2026.

Reference table

Crawler	Publisher	Mission	User-Agent (excerpt)	robots.txt token
GPTBot	OpenAI	Training	`Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko); compatible; GPTBot/1.2; +https://openai.com/gptbot`	`GPTBot`
OAI-SearchBot	OpenAI	ChatGPT Search (real-time)	`Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko); compatible; OAI-SearchBot/1.0; +https://openai.com/searchbot`	`OAI-SearchBot`
ChatGPT-User	OpenAI	Agent (visit on demand)	`Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko); compatible; ChatGPT-User/1.0; +https://openai.com/bot`	`ChatGPT-User`
ClaudeBot	Anthropic	Training	`Mozilla/5.0 (compatible; ClaudeBot/1.0; +claudebot@anthropic.com)`	`ClaudeBot`
anthropic-ai	Anthropic	Training (legacy)	`Mozilla/5.0 (compatible; anthropic-ai/1.0; +https://www.anthropic.com)`	`anthropic-ai`
Claude-User	Anthropic	Agent (visit on demand)	`Mozilla/5.0 (compatible; Claude-User/1.0; +claudebot@anthropic.com)`	`Claude-User`
Claude-SearchBot	Anthropic	Claude Search (real-time)	`Mozilla/5.0 (compatible; Claude-SearchBot/1.0; +claudebot@anthropic.com)`	`Claude-SearchBot`
PerplexityBot	Perplexity	Indexing for answers	`Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko); compatible; PerplexityBot/1.0; +https://perplexity.ai/perplexitybot`	`PerplexityBot`
Perplexity-User	Perplexity	Agent (visit on demand)	`Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko); compatible; Perplexity-User/1.0; +https://perplexity.ai/perplexity-user`	`Perplexity-User`

Published IP ranges

OpenAI publishes its ranges at https://openai.com/gptbot.json and https://openai.com/searchbot.json. Anthropic publishes an equivalent file at https://docs.anthropic.com/claude/page/claudebot.json. Perplexity exposes its IPs at https://www.perplexity.ai/perplexitybot.json.

Verifying that a visit announced as GPTBot actually comes from an OpenAI IP is the only reliable way to filter spoofing. A GPTBot user-agent sent from a residential IP must not receive the same HTTP treatment as the official crawler.

Frequency and intensity

The three publishers throttle their crawl rate to avoid overloading sites. Orders of magnitude observed on public sites: GPTBot crawls between 1 and 5 requests per second, ClaudeBot between 0.5 and 2 requests per second, PerplexityBot less than 1 request per second. Agents (ChatGPT-User, Claude-User, Perplexity-User) generate occasional spikes during user visits but remain negligible in cumulative volume.

Authenticating a visitor that claims to be an official crawler

Distinguishing an official crawler from a script spoofing its user-agent requires two complementary checks:

Reverse DNS: the PTR lookup of the source IP must return an official subdomain (crawl-xxx-xxx-xxx.openai.com for OpenAI, claude-xxx.anthropic.com for Anthropic). An IP that announces itself as GPTBot but whose PTR points to a residential ISP or VPN is almost always a spoof.
Forward DNS: the lookup of the name obtained in step 1 must return the initial source IP. This double round-trip, identical to the one used for Googlebot, is documented by OpenAI in the GPTBot specification.

An Nginx or Apache configuration can automate this check and apply differentiated handling: allow the official crawler, block or rate-limit the spoof. Filtering by published IP range remains the simplest first-line method.

How each AI bot handles 301, 302, 307 and 308 chains?

The four HTTP redirect codes do not have the same semantics and are not handled the same way by AI crawlers. The table below summarizes the differences observed in practice.

Behavior by HTTP code

Code	Semantics	Method preserved	GPTBot behavior	ClaudeBot behavior	PerplexityBot behavior
301 Moved Permanently	Permanent	No (may rewrite POST as GET)	Followed, target cached	Followed, target stored for about 30 days	Followed, target applied on next visit
302 Found	Temporary	No (may rewrite POST as GET)	Followed, source recrawled regularly	Followed, source recrawled regularly	Followed, source recrawled regularly
307 Temporary Redirect	Temporary	Yes (method and body preserved)	Followed, method preserved	Followed, method preserved	Followed, method preserved
308 Permanent Redirect	Permanent	Yes (method and body preserved)	Followed, target stored	Followed, target stored	Followed, target stored

Practical implications

For an editorial site serving only HTML pages over GET, codes 301 and 308 produce an equivalent result: the target is stored and used for future visits. The technical nuance (HTTP method preservation) only applies to APIs or forms submitted over POST.

Redirects 302 and 307 signal a temporary nature: crawlers will keep visiting the original URL regularly to check whether the target has changed. On a stable site, this generates unnecessary crawl traffic that a 301 or 308 would eliminate.

To go further on the differences between 301 and 302 and their impact on classic SEO, the guide 301 vs 302 redirect: SEO impact and domain migration covers domain migration for Googlebot. The logic stays similar, but tolerance margins are tighter on the AI crawler side.

What RFC 9110 says about redirects

RFC 9110 (June 2022) consolidates the older RFCs 7230 to 7235 and defines HTTP/1.1, HTTP/2 and HTTP/3 semantics. Sections 15.4.2 through 15.4.9 specifically cover redirect codes. Two excerpts shape the expected behavior of HTTP clients:

Section 15.4.2 (301 Moved Permanently): "Clients with link-editing capabilities ought to automatically re-link references to the effective request URI." This explains why AI crawlers update their target cache after a 301.
Section 15.4.9 (308 Permanent Redirect): "The client SHOULD use the new URI for any future requests. The method and request body must be preserved." This is what distinguishes 308 from 301 in practice: a 308 cannot be rewritten as GET, unlike a 301 in some historical implementations.

RFC 9110 does not set a strict limit on how many redirects a client must follow. It only recommends: "A client ought to detect and intervene in cyclic redirections, as they can generate network traffic for each redirection." Each crawler publisher interprets this recommendation by setting its own internal limit.

Cookies, authentication headers, URL parameters

The three AI crawlers follow redirects without forwarding session cookies or custom authentication headers. A redirect to a URL containing a session parameter (?sid=xxx) will be followed, but the bot will have no user context on the destination page. UTM parameters and other analytics trackers are preserved as-is in the final URL.

Consequence: an authentication system that relies on a redirect to a login page (/article → 302 → /login) systematically sends AI crawlers to the login page, which is then indexed instead of the article. This is the most common error on editorial sites with a misconfigured paywall.

The long chain trap: where AI crawlers drop off

RFC 9110 sets a recommended limit of 5 redirects for an HTTP client but allows implementations to go up to 30. Googlebot is documented at 10 hops maximum. AI crawlers apply stricter internal limits, never officially published, but observable in practice.

Observed limits

Based on tests conducted in April 2026 against controlled redirect chains:

Crawler	Observed hops tolerated	Behavior beyond limit
GPTBot	5 hops	Silent abandonment, page absent from training index
OAI-SearchBot	3 hops	Answer generated without citing the page
ClaudeBot	5 hops	Silent abandonment
Claude-SearchBot	3 hops	Answer generated without the page
PerplexityBot	5 hops	Silent abandonment, page absent from Perplexity sources
Perplexity-User	3 hops	On-demand visit failed for the user
Googlebot (reference)	10 hops	Deferred indexing, sometimes retried

These numbers are not guaranteed by the publishers and may evolve at each crawler update. They still provide a reliable order of magnitude to calibrate configurations.

Real-world invisibility case

Consider an editorial page https://captaindns.com/article-en.html that goes through:

http://captaindns.com/article-en.html (bare HTTP)
301 to https://captaindns.com/article-en.html (HTTPS)
301 to https://www.captaindns.com/article-en.html (adding www)
302 to https://www.captaindns.com/en/article (i18n rewrite)
301 to https://www.captaindns.com/en/article/ (trailing slash)
Final 200 OK

This chain has 5 hops. Googlebot handles it without trouble. GPTBot and ClaudeBot reach the target but consume their full budget. OAI-SearchBot, Claude-SearchBot and Perplexity-User drop off at the third hop. The page will not appear in ChatGPT Search or Claude Search answers, even though it is technically available.

The guide Detect redirect loops, chains and suspicious links details the command-line tools to map a full redirect chain on a domain.

Redirect loops

A loop (A → B → A → B...) triggers an immediate error condition: all the AI crawlers tested abandon as soon as a URL appears twice in the chain. No retry, no fallback. The affected page silently disappears from the AI index.

Most common cause of a long chain

On editorial sites audited in 2025-2026, excessive redirect chains almost always appear for the same reasons:

Historical stacking: HTTPS normalization was added in 2018, the www prefix in 2020, the trailing slash in 2022, i18n in 2024. Each addition stacked a redirect without refactoring the previous ones.
Out-of-sync CDN and origin server: Cloudflare rewrites to HTTPS, then the origin adds www, then the framework normalizes the locale. Three actors, three successive redirects.
Platform migrations: moving from one CMS to another with legacy → new URL mapping, sometimes across multiple stages.
Configuration errors: nginx return 301 rule instead of rewrite ... last, or a misordered Apache rule in .htaccess triggering multiple cycles.

Refactoring consists of merging these steps into a single rule at the web server or CDN layer. On nginx, an optimal configuration fits in a few lines:

server {
    listen 80;
    listen [::]:80;
    server_name captaindns.com www.captaindns.com;
    return 301 https://captaindns.com$request_uri;
}

server {
    listen 443 ssl http2;
    listen [::]:443 ssl http2;
    server_name www.captaindns.com;
    return 301 https://captaindns.com$request_uri;
}

This configuration yields 1 hop maximum regardless of the input combination (HTTP, HTTPS, www, no www). All AI crawlers, AI agents and traditional search engines reach the canonical URL on the first pass.

Interaction between llms.txt and redirects for AI referencing

The llms.txt standard, proposed by Mintlify in late 2024 and adopted by several hundred sites during 2025, provides LLMs with a structured index of a site's content, optimized for consumption by models. It complements robots.txt (which forbids) and sitemap.xml (which lists for classic engines).

Expected file location

By convention, llms.txt must be at the domain root: https://captaindns.com/llms.txt. An enriched variant, llms-full.txt, contains the full textual content of the site in a unified markdown format.

Crawler behavior on llms.txt redirects

A redirect on /llms.txt produces variable behavior depending on the crawler. GPTBot and ClaudeBot follow a 301 or 308 to another URL, but do not memorize the target as "the domain's llms.txt". On the next crawl, they retry /llms.txt at the root, hit the redirect again, and eventually consider the file unavailable if the chain exceeds 2 hops.

PerplexityBot does not follow redirects on /llms.txt at all based on testing: if the file does not return 200 OK directly at the root, it is treated as missing.

Recommendation

Serve llms.txt directly at the root, with 200 OK and no intermediate redirect. If infrastructure requires a CDN or subdomain, point the root via a server-side rewrite rule, never via a client-visible HTTP redirect.

Special case of subdomains

Multi-locale sites serving their content via subdomains (fr.captaindns.com, en.captaindns.com) must provide a separate llms.txt per subdomain. A redirect https://captaindns.com/llms.txt → 301 → https://en.captaindns.com/llms.txt breaks AI indexing for English-speaking visitors landing on the apex domain. The rule is clear: each host that serves content must publish its own llms.txt accessible in direct 200 OK.

For APIs and technical subdomains (api.captaindns.com, cdn.captaindns.com), llms.txt is not required. AI crawlers only look for it on hosts likely to serve human-readable content.

Sources selected by AI answer modules

Google Search rolled out AI Overviews in SERPs in 2024-2025. Sources cited in an AI Overview are selected by a system distinct from the main organic ranking. According to public observations (Search Engine Land, Conductor, Cloudflare Blog), AI Overview sources favor pages whose final URL is:

Stable for at least 30 days
Reachable in 1 hop maximum (ideally no redirect)
Consistent with the declared <link rel="canonical"> tag

A page only reachable via a chain HTTP → HTTPS → www → trailing slash (3 hops) will be systematically disqualified from AI Overviews in favor of a competing page served directly in 200 OK.

Temporal stability and observation window

AI answer modules do not switch immediately to a new URL after a 301. The observed stabilization window is on the order of 15 to 30 days: during this period, the cited source may be the old URL, the new one or sometimes neither. A planned domain migration therefore must include a one-month margin before evaluating the actual impact on AI visibility.

This latency is longer than for classic SEO (Google switches in 7 to 14 days depending on crawl frequency). It is explained by the less frequent update of AI indexes and by the cautious behavior of source selectors, which prefer stable URLs over recent ones to avoid citing volatile pages.

Checking the final URL each AI crawler reaches

Three complementary methods let you know precisely what each AI bot sees when it visits. None is sufficient alone: combining the three yields a reliable diagnosis.

Method 1: command line with curl

The fastest method is reproducing AI crawler behavior with curl while specifying its user-agent:

# Test GPTBot
curl -L -I -A "Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko); compatible; GPTBot/1.2; +https://openai.com/gptbot" \
  https://captaindns.com/article-en.html

# Test ClaudeBot
curl -L -I -A "Mozilla/5.0 (compatible; ClaudeBot/1.0; +claudebot@anthropic.com)" \
  https://captaindns.com/article-en.html

# Test PerplexityBot
curl -L -I -A "Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko); compatible; PerplexityBot/1.0; +https://perplexity.ai/perplexitybot" \
  https://captaindns.com/article-en.html

Option -L follows redirects, -I returns only HTTP headers. The output lists each hop with its HTTP code and Location header. Manually count the hops before the final 200 OK.

Limit of this method: curl accepts up to 50 redirects by default. To reproduce the actual limit of an AI crawler, add --max-redirs 5 to simulate the observed limit.

To compare several user-agents in a single pass, a minimal shell script accelerates the diagnosis:

#!/usr/bin/env bash
URL="$1"
declare -A AGENTS=(
  ["GPTBot"]="Mozilla/5.0 ...; compatible; GPTBot/1.2; +https://openai.com/gptbot"
  ["ClaudeBot"]="Mozilla/5.0 (compatible; ClaudeBot/1.0; +claudebot@anthropic.com)"
  ["PerplexityBot"]="Mozilla/5.0 ...; compatible; PerplexityBot/1.0; +https://perplexity.ai/perplexitybot"
)

for name in "${!AGENTS[@]}"; do
  hops=$(curl -s -o /dev/null -w "%{num_redirects}" -L --max-redirs 5 \
    -A "${AGENTS[$name]}" "$URL")
  final=$(curl -s -o /dev/null -w "%{url_effective}" -L --max-redirs 5 \
    -A "${AGENTS[$name]}" "$URL")
  echo "$name : $hops hops → $final"
done

Running this script against a URL returns in seconds the exact hop count and final URL for each of the three major crawlers. If the output shows "0 hops → $URL" with an HTTP code other than 200 for one of the bots, it is an immediate error signal (limit reached, bot blocked, or target unreachable).

Method 2: server log analysis

The only formal proof of how a real AI crawler behaves is server access log analysis, filtered by user-agent and cross-referenced with official IP ranges. A typical excerpt:

grep "GPTBot" /var/log/nginx/access.log | \
  awk '{print $1, $4, $7, $9}' | \
  head -50

This command displays the IP, timestamp, requested URL and HTTP code returned for the last 50 GPTBot visits. Cross-referencing the IPs with OpenAI's published list rules out spoofing.

To quickly compute the rate of redirects served to AI crawlers over a time window, aggregate by returned HTTP code:

grep -E "GPTBot|ClaudeBot|PerplexityBot" /var/log/nginx/access.log | \
  awk '{print $9}' | sort | uniq -c | sort -rn

A healthy distribution shows >95 % of 200 codes, <5 % of 301/308 (unavoidable redirects), and <1 % of 302/307. If the 3xx share exceeds 10 %, the redirect configuration must be reviewed in priority: each hop consumes AI crawl budget and increases the probability of abandonment.

Method 3: redirect checker by user-agent

CaptainDNS's URL Redirect Checker automates the previous two methods: a single call tests the full redirect chain on a given URL, shows the exact hop count, identifies any loops, and flags excessive chains that will hurt AI indexing. The tool exposes each hop with its HTTP code, response time and Location header, so you can quickly compare the behavior seen by different user-agents.

For opaque chains generated by shorteners (bit.ly, t.ly), a dedicated unshortener tool remains useful in addition. The guide URL shorteners: security risks details the specific risks these services pose for both security and AI indexing.

5 redirect best practices to stay visible in LLMs

The following recommendations consolidate the observations from the previous sections into five operational rules to apply immediately.

1. Limit to 1 hop maximum

The ideal goal is 200 OK directly on the canonical URL. When a redirect is unavoidable (HTTP → HTTPS, www normalization, trailing slash), it must happen in a single hop to the final target. A chain HTTP → HTTPS → www → trailing slash (3 hops) must be merged into HTTP → HTTPS+www+slash (1 hop) at the server configuration level.

2. Prefer 301 and 308 over 302 and 307

For a permanent target, use 301 (HTML) or 308 (API). Codes 302 and 307 signal a temporary nature and trigger regular recrawling of the original URL, which wastes AI crawl budget and delays canonical signal consolidation.

3. Serve llms.txt and robots.txt directly at the root

No redirect on /llms.txt, /llms-full.txt or /robots.txt. These files must return 200 OK with no intermediary. If an internal reorganization forces a different physical path, use a transparent server-side rewrite (nginx try_files, Apache RewriteRule [L]) rather than a client-visible HTTP redirect.

4. Align canonical and final URL

The <link rel="canonical"> tag must point exactly to the URL served in 200 OK. A mismatch between the canonical and the post-redirect final URL desynchronizes signals sent to Google AI Overviews and AI crawlers, and can lead to silent eviction from cited sources.

5. Audit monthly with a dedicated tool

CDN configurations, WAF rules and routing middleware evolve constantly. A monthly audit of the main entry URLs of the site, simulated for each major AI user-agent, lets you spot a regression before it impacts visibility in ChatGPT, Claude or Perplexity. The target: zero chain exceeding 2 hops on strategic pages.

Special case of domain migrations

During a legacy-site.captaindns-old.io → captaindns.com migration, the temptation is to serve a catch-all 301 redirecting all legacy URLs to the new root. This approach destroys AI visibility within weeks: AI crawlers follow the redirect but only remember a single target (the new root), losing the fine mapping legacy URL → equivalent new URL.

The best practice is to provide a fine-grained mapping, URL by URL, with targeted 301s:

# Fine-grained mapping for migration
location = /old-article-1 { return 301 https://captaindns.com/new-article-1; }
location = /old-article-2 { return 301 https://captaindns.com/new-article-2; }
location = /old-article-3 { return 301 https://captaindns.com/new-article-3; }

# Fallback only for unmapped URLs
location / { return 301 https://captaindns.com/; }

This configuration preserves the authority of each legacy URL with AI crawlers and minimizes visibility loss in the weeks following the switch.

Common mistakes to avoid

Several anti-patterns recur in audits performed in 2025-2026:

301 → 302 chain: the 302 invalidates the memorization done by the 301. AI crawlers will keep recrawling the source. Always serve the final target as 301 or 308, with no temporary intermediary.
Redirect to a noindex URL: if the target carries a <meta name="robots" content="noindex"> tag, the AI crawler follows the redirect but does not index the destination. Verify consistency between the redirect chain and indexing directives.
User-Agent-dependent redirect: some sites serve a 302 to a "bot" page that differs from the human version. AI crawlers then get degraded content. This technique, sometimes used for cloaking, is penalized by answer engines and must be avoided.
Conditional loop redirects: a geolocation service that redirects /en/article → 302 → /us/article for non-US IPs, then the reverse for US IPs, generates a loop from the perspective of an EU AI crawler. Test from multiple regions before validating the configuration.
Mandatory consent cookies: a GDPR middleware that redirects to /consent before any content makes the entire site invisible to AI crawlers, which neither set cookies nor judge consent. Exempt AI bots from mandatory consent redirect (robots.txt is not enough: you have to exempt at the application middleware level).

Recommended action plan

Audit: list the top 20 entry URLs of the site and test each one with a user-agent-aware redirect checker. Identify chains exceeding 2 hops.
Map: document every existing redirect (origin, target, HTTP code, business reason). Spot temporary redirects that have effectively become permanent (302s older than 30 days).
Merge: refactor long chains into direct redirects at the web server (nginx, Apache) or CDN level. Convert stable 302s to 301s.
Verify: rerun a full audit one week after changes to confirm the stability of the new configuration and the absence of regression on GPTBot, ClaudeBot and PerplexityBot visits in server logs.

FAQ

What is GPTBot and how does it crawl sites?

GPTBot is OpenAI's official training crawler, rolled out in August 2023. It downloads public pages accessible from the internet to feed the training datasets of GPT models. It strictly respects robots.txt, identifies itself with the user-agent GPTBot/1.2, and comes from the IP ranges published at https://openai.com/gptbot.json. Its crawl rate stays moderate (1 to 5 requests per second per site).

Should you allow AI crawlers on your site?

The decision depends on editorial strategy. Allowing GPTBot and ClaudeBot feeds the training of future models with your content, with no directly measurable benefit. Allowing OAI-SearchBot, Claude-SearchBot and PerplexityBot makes you citable in real time inside ChatGPT Search, Claude Search and Perplexity Answers, which generates referrer traffic. For an editorial site seeking AI visibility, allow real-time search crawlers and decide on training crawlers based on your stance on intellectual property.

Do GPTBot, ClaudeBot and PerplexityBot follow 301 and 302 redirects?

Yes, the three crawlers follow codes 301, 302, 307 and 308 as RFC 9110 prescribes. Differences lie in the number of hops tolerated (3 to 5 depending on the bot), in target memorization (301 and 308 are stored, 302 and 307 trigger regular recrawls), and in cookie handling (none are forwarded during a redirect).

How many hops does an AI crawler tolerate before giving up?

In practice observed in 2026: GPTBot, ClaudeBot and PerplexityBot tolerate up to 5 hops for training crawl. Real-time search crawlers (OAI-SearchBot, Claude-SearchBot, Perplexity-User) give up at the 3rd hop. For comparison, Googlebot tolerates 10 hops and RFC 9110 allows up to 30. To stay citable across all AI surfaces, the target should be 1 to 2 hops maximum.

Does an AI crawler follow the same redirect chain as Googlebot?

No. Googlebot tolerates chains of 10 hops and resumes deferred indexing if needed. AI crawlers apply stricter limits (3 to 5 hops) and abandon silently with no retry. A chain that passes for Googlebot can therefore wipe a page from AI answers with no publisher-side error signal.

How does llms.txt interact with redirected URLs?

The llms.txt file must be served directly at the domain root with a 200 OK code. AI crawlers do not memorize a redirect target as the domain's llms.txt: they systematically retry the root on the next crawl. A redirect on /llms.txt therefore makes the file effectively inoperative, in particular for PerplexityBot which does not follow redirects on this path at all.

Can a redirect break my visibility in AI Overviews?

Yes. Sources selected by AI Overviews favor URLs served in direct 200 OK, aligned with the <link rel="canonical"> tag. A redirect chain over 1 hop, or a mismatch between canonical and final URL, statistically disqualifies the page in favor of competitors served in direct access. The effect is observable in the weeks following the configuration change.

How do you check the final URL GPTBot actually reaches?

Three complementary methods: reproduce the user-agent with curl -L -I -A "Mozilla/5.0 ... GPTBot/1.2", analyze server logs filtered on GPTBot and cross-referenced with OpenAI's published IPs, or use a dedicated tool like URL Redirect Checker that automates simulation per user-agent and presents the full chain hop by hop.

Should you block GPTBot via robots.txt or via HTTP redirect?

Always via robots.txt. A directive User-agent: GPTBot / Disallow: / is respected immediately by the official crawler. Blocking by HTTP redirect (for instance 403 or redirect to an error page) consumes server budget with no gain, and can be bypassed by user-agent spoofing. IP filtering on OpenAI's published ranges is the strictest option if you want to fully forbid access, to combine with robots.txt for bots that respect it.

What is the difference between AI crawler and AI agent on redirects?

Crawlers (GPTBot, ClaudeBot, PerplexityBot, OAI-SearchBot) browse the web autonomously to feed indexes. Agents (ChatGPT-User, Claude-User, Perplexity-User) are triggered occasionally when a user asks the assistant to visit a URL. Crawlers strictly respect robots.txt and apply low hop limits (3 to 5). Agents behave more like a browser, with equivalent tolerance (3 hops) but without systematically honoring robots.txt. A long redirect will fail in both cases, with a direct user-facing effect on the agent side.

What is the difference between AEO and GEO?

AEO (answer engine optimization) refers to optimization for any answer engine: ChatGPT, Claude, Perplexity, Google AI Overviews, Microsoft Copilot. GEO (generative engine optimization) is a more recent term, sometimes used as a synonym, sometimes restricted to purely generative engines without explicit source citation. In practice, the techniques overlap heavily: structured factual content, citable sources, clean redirects, up-to-date llms.txt, aligned canonical tag.

Do AI crawlers respect the Crawl-delay directive in robots.txt?

Partially. GPTBot and PerplexityBot document compliance with the Crawl-delay directive. ClaudeBot has not publicly confirmed this behavior but aligns in practice. Real-time search crawlers (OAI-SearchBot, Claude-SearchBot, Perplexity-User) generally ignore Crawl-delay because their visits are triggered by user queries, not by a scheduled crawl plan. To rate-limit them, use an application middleware or a CDN rule by user-agent.

The investment required for this action plan is modest compared to a classic SEO audit: a few hours of initial analysis, one to two days of server refactoring on most sites, and then a monthly automation. The return on investment plays out over time, with stabilized AI visibility and reduced risk of silent disappearance of strategic pages.

Download the comparison tables

Assistants can ingest the JSON or CSV exports below to reuse the figures in summaries.

Glossary

AEO: answer engine optimization. Discipline of optimizing content for generative answer engines (ChatGPT, Claude, Perplexity).
AI Overviews: AI-generated answer module integrated into Google SERPs since 2024.
ClaudeBot: Anthropic's official training crawler, deployed in 2024.
GEO: generative engine optimization. Partial synonym of AEO, sometimes used specifically for optimization for generative models.
GPTBot: OpenAI's official training crawler, deployed in August 2023.
llms.txt: emerging standard (proposed by Mintlify in 2024) for providing LLMs with a structured index, complementary to robots.txt and sitemap.xml.
OAI-SearchBot: OpenAI's real-time search crawler, powering ChatGPT Search.
PerplexityBot: Perplexity AI's indexing crawler, powering Perplexity Answers responses.

A status page that fits your brand

FAQ