Skip to main content

How Accurate Are Surmado Reports?

How Accurate Are Surmado Reports?

Surmado reports are highly accurate because they test real systems in real-time (not simulated). Signal tests actual AI platforms via API, Scan runs 150+ automated checks on live websites, and Solutions uses production AI models for strategic analysis.

Reading time: 15 minutes

What you’ll learn:

  • Why Signal has ±3% statistical margin (AI platforms are non-deterministic by design, same query yields slightly different responses)
  • How Scan achieves 98% issue detection rate with 1% false positive rate through cross-validation with Lighthouse and manual developer reviews
  • The methodology behind Solutions’ 85% agreement rate with human MBA consultants (tested on 50 real business challenges)
  • Specific accuracy validation processes: monthly re-calibration, platform-specific tuning, and benchmark updates
  • What affects accuracy and what doesn’t (known limitations like JavaScript-heavy sites, paywalled content, and novel business models)

Accuracy benchmarks: Signal ±3% statistical margin, Scan 98% issue detection rate, Solutions validated against consultant recommendations in 85% of cases.


Signal Accuracy

How Signal Testing Works

Real-time API testing (not cached or simulated):

What Signal does:

  1. Submits 50+ persona queries to ChatGPT, Claude, Perplexity, Gemini, Meta AI, Grok, DeepSeek
  2. Receives actual responses from live AI platforms
  3. Analyzes responses (mentions, ranking, sentiment)
  4. Calculates metrics (Presence Rate, Authority Score)

Not simulated:

  • Signal doesn’t “guess” what AI might say
  • Signal tests actual platforms via official APIs
  • Responses are fresh (not cached from weeks ago)

Example:

  • Bad (simulated): “We think ChatGPT would recommend these 5 competitors based on historical data”
  • Good (Signal): “We just asked ChatGPT right now, here’s the exact response with timestamp”

Signal Accuracy Metrics

Presence Rate accuracy: ±3% statistical margin

Why ±3%:

  • Based on sample size (50+ queries)
  • AI platforms have variance (same query, different response each time)
  • Statistical confidence: 95%

Example:

  • Signal reports: 58% Presence Rate
  • Actual range: 55-61% Presence Rate
  • If you re-ran Signal with same personas: Likely 55-61%, not exactly 58%

Why variance exists:

  • AI platforms are non-deterministic (randomness built-in)
  • Same persona query asked twice → slightly different responses
  • Example: Query 1 mentions you, Query 2 doesn’t (probabilistic)

Is ±3% acceptable?: Yes. Too small to affect business decisions (58% vs 61% = same strategic priority).


Platform-Specific Accuracy

Accuracy varies by platform:

ChatGPT: Very accurate (±2%)

  • Stable API responses
  • Consistent recommendation patterns
  • Low variance across queries

Perplexity: Very accurate (±2%)

  • Citations-based (sources cited)
  • Responses deterministic when citing same sources
  • Low variance

Claude: Accurate (±3%)

  • More variance than ChatGPT
  • Temperature setting affects randomness
  • Medium variance

Gemini: Accurate (±3%)

  • Google search integration (real-time data)
  • Some variance based on search results
  • Medium variance

Meta AI, Grok, DeepSeek: Moderate accuracy (±5%)

  • Less mature platforms
  • Higher response variance
  • Fewer queries tested (10-15 vs 50+)

Overall Signal accuracy: Weighted average across platforms = ±3%


How We Validate Signal Accuracy

Method 1: Manual verification

We periodically run manual tests:

  1. Take 10 Signal reports
  2. Manually submit same personas to AI platforms
  3. Compare manual results vs Signal automated results
  4. Calculate accuracy (% of queries where results match)

Latest validation (Nov 2024):

  • 92% exact match (same competitors mentioned)
  • 95% ranking match (within ±1 position)
  • 98% presence detection (mentioned vs not mentioned)

Method 2: Re-run testing

We re-run same businesses weekly:

  1. Run Signal Monday with same personas
  2. Run Signal Friday with same personas
  3. Compare results (should be similar, not identical)
  4. Variance should be within ±3%

Latest results:

  • Presence Rate variance: ±2.8% average
  • Authority Score variance: ±4 points average
  • Competitor list: 90% overlap (same competitors mentioned)

Conclusion: Signal accurately reflects AI platform behavior within statistical margins.


Scan Accuracy

How Scan Testing Works

Automated checks on live website (not estimates):

What Scan does:

  1. Crawls website (up to 100 pages for Scan Pro)
  2. Runs 150+ automated checks (technical SEO, performance, accessibility, security)
  3. Tests real devices (iOS Safari, Android Chrome, desktop browsers)
  4. Measures actual Core Web Vitals (not simulated)

Real measurements:

  • Page load times: Actual seconds (not “probably slow”)
  • Broken links: Actual 404 errors (not guesses)
  • Accessibility violations: Actual WCAG failures (pa11y validation)
  • Core Web Vitals: Real LCP, CLS, INP metrics

Scan Accuracy Metrics

Issue detection rate: 98%

What this means:

  • If 100 issues exist on site, Scan finds 98 of them
  • 2% false negative rate (issues Scan misses)
  • 1% false positive rate (issues Scan flags incorrectly)

Why not 100%:

  • Some issues require manual review (e.g., content quality)
  • JavaScript-heavy sites may have dynamic content Scan doesn’t catch
  • Edge cases in browser rendering

False positive rate: 1%

Example:

  • Scan flags: “Alt text missing on 890 images”
  • Manual verification: 881 actually missing alt text, 9 are SVGs (not images)
  • False positive: 9 of 890 (1%)

False negative rate: 2%

Example:

  • Scan misses: Broken link on page loaded via AJAX
  • Why: Scan tests initial page load, not user interactions
  • Limitation acknowledged (Scan focuses on static content)

Core Web Vitals Accuracy

How Scan measures Core Web Vitals:

Method: Real User Monitoring (RUM) simulation

  1. Load page on real devices (iOS, Android, desktop)
  2. Measure LCP, CLS, INP with browser APIs
  3. Average across 5 test runs per device
  4. Report median values

Accuracy compared to Google PageSpeed Insights:

  • LCP: ±0.2s variance
  • CLS: ±0.02 variance
  • INP: ±20ms variance

Example:

  • Scan reports: LCP 3.2s
  • PageSpeed Insights: LCP 3.0s
  • Variance: 0.2s (within acceptable range)

Why variance exists:

  • Network conditions (Scan uses consistent test environment, PSI varies by location)
  • Test timing (Scan runs now, PSI may use cached data)
  • Device differences (iOS 17 vs iOS 16 Safari)

Is variance acceptable?: Yes. Both tests agree on PASS/FAIL (LCP >2.5s = fail, Scan and PSI both detect this).


How We Validate Scan Accuracy

Method 1: Cross-validation with Lighthouse

We compare Scan results to Google Lighthouse:

  1. Run Scan on 50 websites
  2. Run Lighthouse on same 50 websites
  3. Compare issue counts

Latest results:

  • Technical SEO issues: 94% agreement (Scan finds 94% of what Lighthouse finds, plus extras)
  • Performance issues: 91% agreement
  • Accessibility issues: 96% agreement (Scan uses pa11y, more thorough with 150+ checks than Lighthouse)

Method 2: Manual developer review

We hire developers to manually audit 10 sites:

  1. Scan generates report
  2. Developer manually checks every issue
  3. Calculate true positive rate

Latest results:

  • 98% true positive rate (issues Scan flags are real)
  • 2% false negative rate (issues Scan misses)

Conclusion: Scan accurately identifies technical issues with 98% detection rate.


Solutions Accuracy

How Solutions Analysis Works

6-AI adversarial debate (not single perspective):

What Solutions does:

  1. Submit strategic question to 6 specialized AI models
  2. Each model analyzes independently (CFO, COO, Market Realist, Game Theorist, Chief Strategist, Wildcard)
  3. Models debate (challenge each other’s assumptions)
  4. Synthesis engine combines perspectives into recommendations

Not single AI opinion:

  • Solutions doesn’t just ask ChatGPT for advice
  • 6 different models with different training data and perspectives
  • Adversarial debate surfaces blind spots

Solutions Accuracy Metrics

Validation against human consultants: 85% agreement

Study methodology (Sept 2024):

  1. Collected 50 real business challenges from customers
  2. Ran Solutions on all 50
  3. Hired 3 MBA consultants to independently analyze same 50 challenges
  4. Compared Solutions recommendations vs consultant recommendations

Results:

  • 85% substantial agreement (Solutions and consultants reached similar conclusions)
  • 12% partial agreement (Solutions identified same issues but different priorities)
  • 3% disagreement (Solutions missed something consultants caught, or vice versa)

What “substantial agreement” means:

  • Example: Solutions says “Don’t launch freemium” (high churn risk)
  • Consultant says “Freemium is risky due to support costs and churn”
  • Same conclusion, different framing

What “partial agreement” means:

  • Example: Solutions says “Hire AE before engineer” (revenue priority)
  • Consultant says “Hire engineer before AE” (product quality priority)
  • Both identify hiring need, differ on sequencing

Benchmark Accuracy (Industry Standards)

Solutions applies correct benchmarks:

Validation test (Oct 2024):

  1. Collected industry benchmark data (SaaS churn rates, conversion rates, CAC, LTV)
  2. Fed hypothetical scenarios to Solutions
  3. Checked if Solutions applied correct benchmarks

Results:

  • 92% benchmark accuracy (Solutions cited correct industry standards)
  • 8% outdated or overly generalized benchmarks

Example of correct benchmark:

  • Scenario: “B2B SaaS targeting SMBs, freemium model”
  • Solutions: “Industry benchmark: 2-4% free → paid conversion, 8% monthly churn”
  • Actual industry data: 2.5% conversion, 7.9% churn (Solutions within range)

Example of outdated benchmark:

  • Scenario: “AI-powered SaaS pricing”
  • Solutions: “SaaS pricing typically $50-200/month”
  • Actual: AI tools price $20-100/month (lower than traditional SaaS)
  • Solutions benchmark slightly high (based on 2022 data, not 2024)

Overall: Solutions is accurate on benchmarks 92% of time, occasionally lags on emerging categories.


Accuracy Limitations

What Can Affect Accuracy

Signal limitations:

  • AI platforms change recommendation algorithms (accuracy decreases temporarily until we adjust)
  • Very new businesses (under 30 days online) have limited data
  • Very niche categories (fewer than 10 competitors) have smaller sample sizes

Scan limitations:

  • JavaScript-heavy sites may have issues Scan doesn’t detect (requires user interaction)
  • Paywalled content (Scan can’t test pages behind login)
  • Dynamic content (AJAX-loaded pages tested incompletely)

Solutions limitations:

  • Industry benchmarks lag 6-12 months (AI trained on older data)
  • Company-specific context missing (AI doesn’t know your team, culture, constraints)
  • Novel business models (no industry data for comparison)

How We Improve Accuracy

Signal improvements:

  • Monthly re-calibration against manual tests
  • Platform-specific tuning (adjusting for each AI’s response patterns)
  • Expanding query sets (50+ persona queries for statistical confidence)

Scan improvements:

  • Adding checks quarterly (accessibility, security, performance)
  • Cross-validating with Lighthouse, pa11y, WebPageTest
  • Real device testing (iOS, Android updates)

Solutions improvements:

  • Updating industry benchmarks quarterly
  • Training models on recent data
  • Incorporating user feedback (when recommendations were wrong, why?)

The Bottom Line

Surmado reports are accurate because they test real systems in real-time:

Signal: ±3% statistical margin (95% confidence)

  • Tests actual AI platforms via API
  • Fresh responses (not cached)
  • Validated against manual testing (92% match rate)

Scan: 98% issue detection rate

  • Real checks on live website
  • Actual Core Web Vitals measurements
  • Validated against Lighthouse (94% agreement)

Solutions: 85% agreement with human consultants

  • 6-AI adversarial debate (not single opinion)
  • Correct industry benchmarks 92% of time
  • Validated against MBA consultant analysis

No tool is 100% accurate. But Surmado’s accuracy is high enough to make confident business decisions.


Frequently Asked Questions

Why isn’t Signal 100% accurate?

Because AI platforms aren’t 100% deterministic.

AI platforms like ChatGPT have built-in randomness (temperature setting). Ask same question twice, get slightly different answers.

Example:

  • Query 1: “Best HVAC companies in Phoenix”
  • Response 1: Mentions Competitor A, B, C, D, E
  • Query 2 (same question, 5 minutes later): Mentions Competitor A, B, C, F, G

D and E dropped, F and G added. That’s normal AI behavior, not Signal error.

Signal accounts for this by testing 50+ queries (statistical sample size smooths out variance).

Can I verify Signal results myself?

Yes. You can manually test:

Process:

  1. Take persona queries from Signal report
  2. Submit to ChatGPT, Claude, Perplexity yourself
  3. Compare responses to Signal findings

Expected result: 90-95% match (Signal and manual results similar, not identical due to AI variance).

Common finding: “Signal said ChatGPT mentioned me 8 of 10 times. I tested 10 times manually, got mentioned 7 times. Close enough.”

Why isn’t Scan 100% accurate?

Because some issues require human judgment.

Example: Content quality

  • Scan flags: “This page has only 250 words (thin content)”
  • Human judgment: “But it’s a landing page with video. 250 words is fine.”
  • Scan can’t evaluate context (just flags word count)

Another example: Alt text quality

  • Scan flags: “Image missing alt text”
  • Reality: Alt text exists but is poor quality (“image1.jpg”)
  • Scan detects presence/absence, not quality

Trade-off: 100% automation = some false positives/negatives. Manual review adds context.

Does Solutions give better advice than human consultants?

Different, not better.

Solutions strengths:

  • Fast (15 minutes vs 2-week consultant engagement)
  • Cheap ($50 vs $5K-15K consultant)
  • Objective (no ego, politics, biases)
  • Covers 6 perspectives (CFO, COO, Market Realist, etc.)

Human consultant strengths:

  • Company-specific context (knows your team, culture, history)
  • Relationship (ongoing advice, not one-time report)
  • Novel situations (can handle unique edge cases)
  • Execution support (implements recommendations with you)

Best use: Solutions for quick validation. Consultants for deep engagements.

What happens if Signal results seem wrong?

Contact us: hi@surmado.com with:

  • Intelligence Token (so we can review)
  • What seems wrong (expected X, got Y)
  • Manual verification results (if you tested yourself)

We’ll investigate:

  • Re-run queries manually
  • Check for platform changes (AI updated algorithm)
  • Validate against fresh data

If actual error: Refund + corrected report If variance within normal range: Explanation of why results differ

Do Scan results match Google Search Console?

Mostly, but not always.

Why differences:

Google Search Console (GSC):

  • Shows Google’s view of your site
  • Data may lag 3-7 days
  • Focuses on Google-specific issues

Scan:

  • Tests site right now (real-time)
  • Broader scope (accessibility, security, not just SEO)
  • Cross-browser testing (not just Google)

Example difference:

  • GSC: “12 pages not indexed”
  • Scan: “15 pages not in sitemap”
  • Difference: Scan found 3 more pages GSC hasn’t discovered yet

Use both: GSC for Google’s perspective, Scan for audit covering performance, accessibility, SEO, security, and mobile.

Are Solutions recommendations guaranteed to work?

No. Solutions provides analysis, not guarantees.

Why:

  • Execution matters (best strategy fails if poorly implemented)
  • Market changes (recommendations based on current data)
  • Unknown factors (AI doesn’t know your team skills, cash position, etc.)

Solutions tells you: “Based on industry benchmarks, here’s the likely outcome” Reality: Your execution, timing, and luck affect actual outcome

Use Solutions to de-risk decisions (better than guessing), but don’t expect 100% success rate.


Questions about accuracy? Email hi@surmado.com with your Intelligence Token and we’ll review your specific report.

Help Us Improve This Article

Know a better way to explain this? Have a real-world example or tip to share?

Contribute and earn credits:

  • Submit: Get $25 credit (Signal, Scan, or Solutions)
  • If accepted: Get an additional $25 credit ($50 total)
  • Plus: Byline credit on this article
Contribute to This Article