Skip to main content

How Accurate Are Surmado Reports?

How Accurate Are Surmado Reports?

Surmado reports are highly accurate because they test real systems in real-time (not simulated). AI Visibility tests actual AI platforms via API, Site Audit runs 150+ automated checks on live websites, and Strategy uses production AI models for strategic analysis.

Reading time: 15 minutes

What you’ll learn:

  • Why AI Visibility has ±3% statistical margin (AI platforms are non-deterministic by design, same query yields slightly different responses)
  • How Site Audit achieves 98% issue detection rate with 1% false positive rate through cross-validation with Lighthouse and manual developer reviews
  • The methodology behind Strategy’ 85% agreement rate with human MBA consultants (tested on 50 real business challenges)
  • Specific accuracy validation processes: monthly re-calibration, platform-specific tuning, and benchmark updates
  • What affects accuracy and what doesn’t (known limitations like JavaScript-heavy sites, paywalled content, and novel business models)

Accuracy benchmarks: AI Visibility ±3% statistical margin, Site Audit 98% issue detection rate, Strategy validated against consultant recommendations in 85% of cases.


AI Visibility Accuracy

How AI Visibility Testing Works

Real-time API testing (not cached or simulated):

What AI Visibility does:

  1. Submits 50+ persona queries to ChatGPT, Claude, Perplexity, Gemini, Meta AI, Grok, DeepSeek
  2. Receives actual responses from live AI platforms
  3. Analyzes responses (mentions, ranking, sentiment)
  4. Calculates metrics (Presence Rate, Authority Score)

Not simulated:

  • AI Visibility doesn’t “guess” what AI might say
  • AI Visibility tests actual platforms via official APIs
  • Responses are fresh (not cached from weeks ago)

Example:

  • Bad (simulated): “We think ChatGPT would recommend these 5 competitors based on historical data”
  • Good (AI Visibility): “We just asked ChatGPT right now, here’s the exact response with timestamp”

AI Visibility Accuracy Metrics

Presence Rate accuracy: ±3% statistical margin

Why ±3%:

  • Based on sample size (50+ queries)
  • AI platforms have variance (same query, different response each time)
  • Statistical confidence: 95%

Example:

  • AI Visibility reports: 58% Presence Rate
  • Actual range: 55-61% Presence Rate
  • If you re-ran AI Visibility with same personas: Likely 55-61%, not exactly 58%

Why variance exists:

  • AI platforms are non-deterministic (randomness built-in)
  • Same persona query asked twice → slightly different responses
  • Example: Query 1 mentions you, Query 2 doesn’t (probabilistic)

Is ±3% acceptable?: Yes. Too small to affect business decisions (58% vs 61% = same strategic priority).


Platform-Specific Accuracy

Accuracy varies by platform:

ChatGPT: Very accurate (±2%)

  • Stable API responses
  • Consistent recommendation patterns
  • Low variance across queries

Perplexity: Very accurate (±2%)

  • Citations-based (sources cited)
  • Responses deterministic when citing same sources
  • Low variance

Claude: Accurate (±3%)

  • More variance than ChatGPT
  • Temperature setting affects randomness
  • Medium variance

Gemini: Accurate (±3%)

  • Google search integration (real-time data)
  • Some variance based on search results
  • Medium variance

Meta AI, Grok, DeepSeek: Moderate accuracy (±5%)

  • Less mature platforms
  • Higher response variance
  • Fewer queries tested (10-15 vs 50+)

Overall AI Visibility accuracy: Weighted average across platforms = ±3%


How We Validate AI Visibility Accuracy

Method 1: Manual verification

We periodically run manual tests:

  1. Take 10 AI Visibility reports
  2. Manually submit same personas to AI platforms
  3. Compare manual results vs AI Visibility automated results
  4. Calculate accuracy (% of queries where results match)

Latest validation (Nov 2024):

  • 92% exact match (same competitors mentioned)
  • 95% ranking match (within ±1 position)
  • 98% presence detection (mentioned vs not mentioned)

Method 2: Re-run testing

We re-run same businesses weekly:

  1. Run AI Visibility Monday with same personas
  2. Run AI Visibility Friday with same personas
  3. Compare results (should be similar, not identical)
  4. Variance should be within ±3%

Latest results:

  • Presence Rate variance: ±2.8% average
  • Authority Score variance: ±4 points average
  • Competitor list: 90% overlap (same competitors mentioned)

Conclusion: AI Visibility accurately reflects AI platform behavior within statistical margins.


Site Audit Accuracy

How Site Audit Testing Works

Automated checks on live website (not estimates):

What Site Audit does:

  1. Crawls website (up to 100 pages for Site Audit)
  2. Runs 150+ automated checks (technical SEO, performance, accessibility, security)
  3. Tests real devices (iOS Safari, Android Chrome, desktop browsers)
  4. Measures actual Core Web Vitals (not simulated)

Real measurements:

  • Page load times: Actual seconds (not “probably slow”)
  • Broken links: Actual 404 errors (not guesses)
  • Accessibility violations: Actual WCAG failures (pa11y validation)
  • Core Web Vitals: Real LCP, CLS, INP metrics

Site Audit Accuracy Metrics

Issue detection rate: 98%

What this means:

  • If 100 issues exist on site, Site Audit finds 98 of them
  • 2% false negative rate (issues Site Audit misses)
  • 1% false positive rate (issues Site Audit flags incorrectly)

Why not 100%:

  • Some issues require manual review (e.g., content quality)
  • JavaScript-heavy sites may have dynamic content Site Audit doesn’t catch
  • Edge cases in browser rendering

False positive rate: 1%

Example:

  • Site Audit flags: “Alt text missing on 890 images”
  • Manual verification: 881 actually missing alt text, 9 are SVGs (not images)
  • False positive: 9 of 890 (1%)

False negative rate: 2%

Example:

  • Site Audit misses: Broken link on page loaded via AJAX
  • Why: Site Audit tests initial page load, not user interactions
  • Limitation acknowledged (Site Audit focuses on static content)

Core Web Vitals Accuracy

How Site Audit measures Core Web Vitals:

Method: Real User Monitoring (RUM) simulation

  1. Load page on real devices (iOS, Android, desktop)
  2. Measure LCP, CLS, INP with browser APIs
  3. Average across 5 test runs per device
  4. Report median values

Accuracy compared to Google PageSpeed Insights:

  • LCP: ±0.2s variance
  • CLS: ±0.02 variance
  • INP: ±20ms variance

Example:

  • Site Audit reports: LCP 3.2s
  • PageSpeed Insights: LCP 3.0s
  • Variance: 0.2s (within acceptable range)

Why variance exists:

  • Network conditions (Site Audit uses consistent test environment, PSI varies by location)
  • Test timing (Site Audit runs now, PSI may use cached data)
  • Device differences (iOS 17 vs iOS 16 Safari)

Is variance acceptable?: Yes. Both tests agree on PASS/FAIL (LCP >2.5s = fail, Site Audit and PSI both detect this).


How We Validate Site Audit Accuracy

Method 1: Cross-validation with Lighthouse

We compare Site Audit results to Google Lighthouse:

  1. Run Site Audit on 50 websites
  2. Run Lighthouse on same 50 websites
  3. Compare issue counts

Latest results:

  • Technical SEO issues: 94% agreement (Site Audit finds 94% of what Lighthouse finds, plus extras)
  • Performance issues: 91% agreement
  • Accessibility issues: 96% agreement (Site Audit uses pa11y, more thorough with 150+ checks than Lighthouse)

Method 2: Manual developer review

We hire developers to manually audit 10 sites:

  1. Site Audit generates report
  2. Developer manually checks every issue
  3. Calculate true positive rate

Latest results:

  • 98% true positive rate (issues Site Audit flags are real)
  • 2% false negative rate (issues Site Audit misses)

Conclusion: Site Audit accurately identifies technical issues with 98% detection rate.


Strategy Accuracy

How Strategy Analysis Works

6-AI adversarial debate (not single perspective):

What Strategy does:

  1. Submit strategic question to 6 specialized AI models
  2. Each model analyzes independently (CFO, COO, Market Realist, Game Theorist, Chief Strategist, Wildcard)
  3. Models debate (challenge each other’s assumptions)
  4. Synthesis engine combines perspectives into recommendations

Not single AI opinion:

  • Strategy doesn’t just ask ChatGPT for advice
  • 6 different models with different training data and perspectives
  • Adversarial debate surfaces blind spots

Strategy Accuracy Metrics

Validation against human consultants: 85% agreement

Study methodology (Sept 2024):

  1. Collected 50 real business challenges from customers
  2. Ran Strategy on all 50
  3. Hired 3 MBA consultants to independently analyze same 50 challenges
  4. Compared Strategy recommendations vs consultant recommendations

Results:

  • 85% substantial agreement (Strategy and consultants reached similar conclusions)
  • 12% partial agreement (Strategy identified same issues but different priorities)
  • 3% disagreement (Strategy missed something consultants caught, or vice versa)

What “substantial agreement” means:

  • Example: Strategy says “Don’t launch freemium” (high churn risk)
  • Consultant says “Freemium is risky due to support costs and churn”
  • Same conclusion, different framing

What “partial agreement” means:

  • Example: Strategy says “Hire AE before engineer” (revenue priority)
  • Consultant says “Hire engineer before AE” (product quality priority)
  • Both identify hiring need, differ on sequencing

Benchmark Accuracy (Industry Standards)

Strategy applies correct benchmarks:

Validation test (Oct 2024):

  1. Collected industry benchmark data (SaaS churn rates, conversion rates, CAC, LTV)
  2. Fed hypothetical scenarios to Strategy
  3. Checked if Strategy applied correct benchmarks

Results:

  • 92% benchmark accuracy (Strategy cited correct industry standards)
  • 8% outdated or overly generalized benchmarks

Example of correct benchmark:

  • Scenario: “B2B SaaS targeting SMBs, freemium model”
  • Strategy: “Industry benchmark: 2-4% free → paid conversion, 8% monthly churn”
  • Actual industry data: 2.5% conversion, 7.9% churn (Strategy within range)

Example of outdated benchmark:

  • Scenario: “AI-powered SaaS pricing”
  • Strategy: “SaaS pricing typically $50-200/month”
  • Actual: AI tools price $20-100/month (lower than traditional SaaS)
  • Strategy benchmark slightly high (based on 2022 data, not 2024)

Overall: Strategy is accurate on benchmarks 92% of time, occasionally lags on emerging categories.


Accuracy Limitations

What Can Affect Accuracy

AI Visibility limitations:

  • AI platforms change recommendation algorithms (accuracy decreases temporarily until we adjust)
  • Very new businesses (under 30 days online) have limited data
  • Very niche categories (fewer than 10 competitors) have smaller sample sizes

Site Audit limitations:

  • JavaScript-heavy sites may have issues Site Audit doesn’t detect (requires user interaction)
  • Paywalled content (Site Audit can’t test pages behind login)
  • Dynamic content (AJAX-loaded pages tested incompletely)

Strategy limitations:

  • Industry benchmarks lag 6-12 months (AI trained on older data)
  • Company-specific context missing (AI doesn’t know your team, culture, constraints)
  • Novel business models (no industry data for comparison)

How We Improve Accuracy

AI Visibility improvements:

  • Monthly re-calibration against manual tests
  • Platform-specific tuning (adjusting for each AI’s response patterns)
  • Expanding query sets (50+ persona queries for statistical confidence)

Site Audit improvements:

  • Adding checks quarterly (accessibility, security, performance)
  • Cross-validating with Lighthouse, pa11y, WebPageTest
  • Real device testing (iOS, Android updates)

Strategy improvements:

  • Updating industry benchmarks quarterly
  • Training models on recent data
  • Incorporating user feedback (when recommendations were wrong, why?)

The Bottom Line

Surmado reports are accurate because they test real systems in real-time:

AI Visibility: ±3% statistical margin (95% confidence)

  • Tests actual AI platforms via API
  • Fresh responses (not cached)
  • Validated against manual testing (92% match rate)

Site Audit: 98% issue detection rate

  • Real checks on live website
  • Actual Core Web Vitals measurements
  • Validated against Lighthouse (94% agreement)

Strategy: 85% agreement with human consultants

  • 6-AI adversarial debate (not single opinion)
  • Correct industry benchmarks 92% of time
  • Validated against MBA consultant analysis

No tool is 100% accurate. But Surmado’s accuracy is high enough to make confident business decisions.


Frequently Asked Questions

Why isn’t AI Visibility 100% accurate?

Because AI platforms aren’t 100% deterministic.

AI platforms like ChatGPT have built-in randomness (temperature setting). Ask same question twice, get slightly different answers.

Example:

  • Query 1: “Best HVAC companies in Phoenix”
  • Response 1: Mentions Competitor A, B, C, D, E
  • Query 2 (same question, 5 minutes later): Mentions Competitor A, B, C, F, G

D and E dropped, F and G added. That’s normal AI behavior, not AI Visibility error.

AI Visibility accounts for this by testing 50+ queries (statistical sample size smooths out variance).

Can I verify AI Visibility results myself?

Yes. You can manually test:

Process:

  1. Take persona queries from AI Visibility report
  2. Submit to ChatGPT, Claude, Perplexity yourself
  3. Compare responses to AI Visibility findings

Expected result: 90-95% match (AI Visibility and manual results similar, not identical due to AI variance).

Common finding: “AI Visibility said ChatGPT mentioned me 8 of 10 times. I tested 10 times manually, got mentioned 7 times. Close enough.”

Why isn’t Site Audit 100% accurate?

Because some issues require human judgment.

Example: Content quality

  • Site Audit flags: “This page has only 250 words (thin content)”
  • Human judgment: “But it’s a landing page with video. 250 words is fine.”
  • Site Audit can’t evaluate context (just flags word count)

Another example: Alt text quality

  • Site Audit flags: “Image missing alt text”
  • Reality: Alt text exists but is poor quality (“image1.jpg”)
  • Site Audit detects presence/absence, not quality

Trade-off: 100% automation = some false positives/negatives. Manual review adds context.

Does Strategy give better advice than human consultants?

Different, not better.

Strategy strengths:

  • Fast (15 minutes vs 2-week consultant engagement)
  • Cheap ($50 vs $5K-15K consultant)
  • Objective (no ego, politics, biases)
  • Covers 6 perspectives (CFO, COO, Market Realist, etc.)

Human consultant strengths:

  • Company-specific context (knows your team, culture, history)
  • Relationship (ongoing advice, not one-time report)
  • Novel situations (can handle unique edge cases)
  • Execution support (implements recommendations with you)

Best use: Strategy for quick validation. Consultants for deep engagements.

What happens if AI Visibility results seem wrong?

Contact us: hi@surmado.com with:

  • Intelligence Token (so we can review)
  • What seems wrong (expected X, got Y)
  • Manual verification results (if you tested yourself)

We’ll investigate:

  • Re-run queries manually
  • Check for platform changes (AI updated algorithm)
  • Validate against fresh data

If actual error: Refund + corrected report If variance within normal range: Explanation of why results differ

Do Site Audit results match Google Search Console?

Mostly, but not always.

Why differences:

Google Search Console (GSC):

  • Shows Google’s view of your site
  • Data may lag 3-7 days
  • Focuses on Google-specific issues

Site Audit:

  • Tests site right now (real-time)
  • Broader scope (accessibility, security, not just SEO)
  • Cross-browser testing (not just Google)

Example difference:

  • GSC: “12 pages not indexed”
  • Site Audit: “15 pages not in sitemap”
  • Difference: Site Audit found 3 more pages GSC hasn’t discovered yet

Use both: GSC for Google’s perspective, Site Audit for audit covering performance, accessibility, SEO, security, and mobile.

Are Strategy recommendations guaranteed to work?

No. Strategy provides analysis, not guarantees.

Why:

  • Execution matters (best strategy fails if poorly implemented)
  • Market changes (recommendations based on current data)
  • Unknown factors (AI doesn’t know your team skills, cash position, etc.)

Strategy tells you: “Based on industry benchmarks, here’s the likely outcome” Reality: Your execution, timing, and luck affect actual outcome

Use Strategy to de-risk decisions (better than guessing), but don’t expect 100% success rate.


Questions about accuracy? Email hi@surmado.com with your Intelligence Token and we’ll review your specific report.

Help Us Improve This Article

Know a better way to explain this? Have a real-world example or tip to share?

Contribute and earn jobs:

  • Submit: Get 1 free job (AI Visibility, Site Audit, or Strategy)
  • If accepted: Get an additional free job (2 total)
  • Plus: Byline credit on this article
Contribute to This Article