Skip to main content

HTTP tools

Analyze your web pages: crawl, weight, headers, and link security

Two tools to diagnose HTTP issues on your web pages. The Page Crawl Checker analyzes HTML weight, headers, crawl budget, and sub-resources. The Phishing URL Checker checks a suspicious link against 4 threat intelligence databases.

Page Crawl Checker

Analyze HTML weight, headers, crawl budget, sub-resources, and client-side redirects. Check compliance with Googlebot's 2 MB limit.

Phishing URL Checker

Check if a URL is flagged as phishing or malware by URLhaus, Google Safe Browsing, PhishTank, and VirusTotal. Risk score and verdict in seconds.

Why use HTTP tools?

The HTTP protocol is the foundation of every web page. An HTTP issue means poorly indexed content, a slow site, or wasted resources. Without HTTP analysis, you have no idea what Googlebot actually sees on your pages, or whether your visitors are clicking on safe links.

Four situations where these tools are essential:

  • Page too heavy → Googlebot truncates HTML beyond 2 MB, your internal links and FAQ disappear from the index
  • Incorrect headers → A misconfigured Content-Type or a forgotten X-Robots-Tag: noindex can deindex an entire page
  • Wasted crawl budget → Excessive sub-resources and lack of compression reduce the number of pages Google crawls
  • Suspicious link received → An email or text message contains a dubious link, you need to check if it's flagged as phishing or malware before clicking

How to use the HTTP tools

Step 1: Choose the tool

NeedTool to use
Analyze the weight, headers, and crawl of a pagePage Crawl Checker
Check if a link is phishing or malwarePhishing URL Checker

Step 2: Enter the URL

Enter the full URL in the input field. Both tools accept any public URL:

https://www.captaindns.com/en/blog

For the Page Crawl Checker, prioritize testing your longest pages (categories, product pages, articles with many images). For the Phishing URL Checker, paste the suspicious link received by email or text message directly.

Step 3: Analyze the results

Each tool provides a detailed report:

  • Page Crawl Checker: HTML size, crawl budget score (0-100), sub-resource inventory, robots.txt check, HTTP headers, client-side redirect detection, SHA-256 fingerprint
  • Phishing URL Checker: overall verdict (clean, suspicious, malicious), risk score (0-100), details by threat intelligence source, coverage diagnostics

Tool details

Page Crawl Checker

Complete crawl analysis of a web page from Googlebot's perspective:

FeatureDescription
Size analysisRaw and decompressed HTML weight, ratio against the 2 MB limit (or 64 MB for PDFs)
Crawl budget scoreScore out of 100 evaluating the page's crawl efficiency, with factor breakdown
Sub-resourcesComplete inventory of scripts, CSS, images, fonts, and iframes with size and status
robots.txt checkGooglebot access allowed or blocked, crawl-delay, declared sitemaps
HTTP headersContent-Type, Content-Encoding, Cache-Control, X-Robots-Tag, HSTS, Server
Client-side redirectsDetection of meta refresh and JavaScript redirects invisible to Googlebot
Mobile/desktop comparisonSize, header, and content differences between smartphone and desktop versions
SHA-256 fingerprintContent hash to detect changes between analyses
WAF detectionWeb application firewall identification with multi-User-Agent fallback

Use case: Diagnose size and crawl issues impacting Google indexation, optimize crawl budget, and detect JavaScript redirects that Googlebot does not follow.


Phishing URL Checker

URL verification against 4 threat intelligence databases:

FeatureDescription
4 sources queriedURLhaus (malware), Google Safe Browsing (phishing), PhishTank (community phishing), VirusTotal (70+ antivirus engines)
Risk scoreWeighted score from 0 to 100 based on each source's reliability
Overall verdictClean, suspicious, malicious, or indeterminate
Details by sourceIndividual status, detected threat types, and response time
Accepted formatsFull URL, bare domain name, or IP address
DiagnosticsInformation on unavailable sources, timeouts, and limited coverage

Use case: Check a suspicious link before clicking, protect your organization against phishing campaigns, and verify that your own domain is not falsely flagged (false positive).


Real-world use cases

Case 1: E-commerce page truncated by Googlebot

Symptom: The FAQ and navigation links at the bottom of your category page do not appear in Google results.

Diagnosis: The Page Crawl Checker reveals the page is 3.2 MB of HTML. Googlebot truncates at 2 MB and loses the last 200 products, the FAQ, and the footer's internal linking.

Action: Limit the initial listing, use pagination with lazy loading, and move the FAQ to the top of the page.


Case 2: Bank phishing email

Symptom: You receive an urgent email from your "bank" with an account verification link.

Diagnosis: The Phishing URL Checker returns a score of 75 (high). Google Safe Browsing and PhishTank flag the URL as social engineering phishing.

Action: Do not click. Report the email as phishing. Access your bank's website by typing the address directly in the browser.


Case 3: Low crawl budget score

Symptom: Google crawls few pages on your site despite regularly updated content.

Diagnosis: The Page Crawl Checker shows a score of 35/100. The page loads 85 sub-resources including 40 third-party scripts (analytics, widgets, A/B testing).

Action: Load third-party scripts with defer/async, remove unused scripts, enable gzip/brotli compression.


Symptom: A text message contains a bit.ly link asking you to "update your package delivery."

Diagnosis: After expanding the shortened link, the Phishing URL Checker flags the final URL. URLhaus lists it as malware distribution.

Action: Delete the text message and block the number. Legitimate delivery services never request payment via text message.


❓ FAQ - Frequently asked questions

Q: Why analyze web pages with HTTP tools?

A: HTTP tools detect invisible issues: pages too heavy for Googlebot (truncation beyond 2 MB), misconfigured headers (forgotten X-Robots-Tag: noindex), JavaScript redirects that Googlebot does not follow. These issues directly impact your SEO rankings with no warning in Search Console.


Q: How to check if a link is phishing?

A: Paste the URL into the Phishing URL Checker. The tool queries 4 threat intelligence databases in parallel (URLhaus, Google Safe Browsing, PhishTank, VirusTotal) and returns a verdict with a risk score from 0 to 100.


Q: What is Googlebot's 2 MB limit?

A: Google can download and index the first 2,097,152 bytes of a page's HTML. Beyond that, content is truncated. The limit applies to decompressed HTML: gzip/brotli compression does not bypass this limit.


Q: What is crawl budget?

A: Crawl budget is the number of pages Googlebot can crawl within a given time frame. Heavy pages with many sub-resources consume more resources. The Page Crawl Checker calculates a score out of 100 to evaluate each page's efficiency.


Q: What is the difference between phishing and malware?

A: Phishing impersonates a legitimate service to steal credentials. Malware distributes malicious software (viruses, ransomware, trojans). A URL can be flagged for both. The Phishing URL Checker distinguishes these categories in the results.


Q: Is the Phishing URL Checker result 100% reliable?

A: No tool guarantees 100% detection. The average lifespan of a phishing URL is under 24 hours. A "clean" result means no source flags it at the time of the check, not that it is permanently safe.


Q: How to reduce a web page's weight?

A: Remove unnecessary inline CSS and JavaScript, enable gzip or brotli compression, externalize SVGs, minify HTML, and use lazy loading. The Page Crawl Checker identifies specific improvement points.


Complementary tools

ToolPurpose
DNS LookupCheck the DNS records for your domain
Email deliverability auditAnalyze MX, SPF, DKIM, and DMARC for your domain
DNS Propagation CheckerConfirm your DNS changes have propagated
IP Blacklist CheckerCheck if an IP is listed on email blacklists
Domain Blacklist CheckerCheck if a domain is blacklisted for spam or phishing

Useful resources