How Accurate Are Surmado Reports?
How Accurate Are Surmado Reports?
Surmado reports are highly accurate because they test real systems in real-time (not simulated). Signal tests actual AI platforms via API, Scan runs 150+ automated checks on live websites, and Solutions uses production AI models for strategic analysis.
Reading time: 15 minutes
What you’ll learn:
- Why Signal has ±3% statistical margin (AI platforms are non-deterministic by design, same query yields slightly different responses)
- How Scan achieves 98% issue detection rate with 1% false positive rate through cross-validation with Lighthouse and manual developer reviews
- The methodology behind Solutions’ 85% agreement rate with human MBA consultants (tested on 50 real business challenges)
- Specific accuracy validation processes: monthly re-calibration, platform-specific tuning, and benchmark updates
- What affects accuracy and what doesn’t (known limitations like JavaScript-heavy sites, paywalled content, and novel business models)
Accuracy benchmarks: Signal ±3% statistical margin, Scan 98% issue detection rate, Solutions validated against consultant recommendations in 85% of cases.
Signal Accuracy
How Signal Testing Works
Real-time API testing (not cached or simulated):
What Signal does:
- Submits 50+ persona queries to ChatGPT, Claude, Perplexity, Gemini, Meta AI, Grok, DeepSeek
- Receives actual responses from live AI platforms
- Analyzes responses (mentions, ranking, sentiment)
- Calculates metrics (Presence Rate, Authority Score)
Not simulated:
- Signal doesn’t “guess” what AI might say
- Signal tests actual platforms via official APIs
- Responses are fresh (not cached from weeks ago)
Example:
- Bad (simulated): “We think ChatGPT would recommend these 5 competitors based on historical data”
- Good (Signal): “We just asked ChatGPT right now, here’s the exact response with timestamp”
Signal Accuracy Metrics
Presence Rate accuracy: ±3% statistical margin
Why ±3%:
- Based on sample size (50+ queries)
- AI platforms have variance (same query, different response each time)
- Statistical confidence: 95%
Example:
- Signal reports: 58% Presence Rate
- Actual range: 55-61% Presence Rate
- If you re-ran Signal with same personas: Likely 55-61%, not exactly 58%
Why variance exists:
- AI platforms are non-deterministic (randomness built-in)
- Same persona query asked twice → slightly different responses
- Example: Query 1 mentions you, Query 2 doesn’t (probabilistic)
Is ±3% acceptable?: Yes. Too small to affect business decisions (58% vs 61% = same strategic priority).
Platform-Specific Accuracy
Accuracy varies by platform:
ChatGPT: Very accurate (±2%)
- Stable API responses
- Consistent recommendation patterns
- Low variance across queries
Perplexity: Very accurate (±2%)
- Citations-based (sources cited)
- Responses deterministic when citing same sources
- Low variance
Claude: Accurate (±3%)
- More variance than ChatGPT
- Temperature setting affects randomness
- Medium variance
Gemini: Accurate (±3%)
- Google search integration (real-time data)
- Some variance based on search results
- Medium variance
Meta AI, Grok, DeepSeek: Moderate accuracy (±5%)
- Less mature platforms
- Higher response variance
- Fewer queries tested (10-15 vs 50+)
Overall Signal accuracy: Weighted average across platforms = ±3%
How We Validate Signal Accuracy
Method 1: Manual verification
We periodically run manual tests:
- Take 10 Signal reports
- Manually submit same personas to AI platforms
- Compare manual results vs Signal automated results
- Calculate accuracy (% of queries where results match)
Latest validation (Nov 2024):
- 92% exact match (same competitors mentioned)
- 95% ranking match (within ±1 position)
- 98% presence detection (mentioned vs not mentioned)
Method 2: Re-run testing
We re-run same businesses weekly:
- Run Signal Monday with same personas
- Run Signal Friday with same personas
- Compare results (should be similar, not identical)
- Variance should be within ±3%
Latest results:
- Presence Rate variance: ±2.8% average
- Authority Score variance: ±4 points average
- Competitor list: 90% overlap (same competitors mentioned)
Conclusion: Signal accurately reflects AI platform behavior within statistical margins.
Scan Accuracy
How Scan Testing Works
Automated checks on live website (not estimates):
What Scan does:
- Crawls website (up to 100 pages for Scan Pro)
- Runs 150+ automated checks (technical SEO, performance, accessibility, security)
- Tests real devices (iOS Safari, Android Chrome, desktop browsers)
- Measures actual Core Web Vitals (not simulated)
Real measurements:
- Page load times: Actual seconds (not “probably slow”)
- Broken links: Actual 404 errors (not guesses)
- Accessibility violations: Actual WCAG failures (pa11y validation)
- Core Web Vitals: Real LCP, CLS, INP metrics
Scan Accuracy Metrics
Issue detection rate: 98%
What this means:
- If 100 issues exist on site, Scan finds 98 of them
- 2% false negative rate (issues Scan misses)
- 1% false positive rate (issues Scan flags incorrectly)
Why not 100%:
- Some issues require manual review (e.g., content quality)
- JavaScript-heavy sites may have dynamic content Scan doesn’t catch
- Edge cases in browser rendering
False positive rate: 1%
Example:
- Scan flags: “Alt text missing on 890 images”
- Manual verification: 881 actually missing alt text, 9 are SVGs (not images)
- False positive: 9 of 890 (1%)
False negative rate: 2%
Example:
- Scan misses: Broken link on page loaded via AJAX
- Why: Scan tests initial page load, not user interactions
- Limitation acknowledged (Scan focuses on static content)
Core Web Vitals Accuracy
How Scan measures Core Web Vitals:
Method: Real User Monitoring (RUM) simulation
- Load page on real devices (iOS, Android, desktop)
- Measure LCP, CLS, INP with browser APIs
- Average across 5 test runs per device
- Report median values
Accuracy compared to Google PageSpeed Insights:
- LCP: ±0.2s variance
- CLS: ±0.02 variance
- INP: ±20ms variance
Example:
- Scan reports: LCP 3.2s
- PageSpeed Insights: LCP 3.0s
- Variance: 0.2s (within acceptable range)
Why variance exists:
- Network conditions (Scan uses consistent test environment, PSI varies by location)
- Test timing (Scan runs now, PSI may use cached data)
- Device differences (iOS 17 vs iOS 16 Safari)
Is variance acceptable?: Yes. Both tests agree on PASS/FAIL (LCP >2.5s = fail, Scan and PSI both detect this).
How We Validate Scan Accuracy
Method 1: Cross-validation with Lighthouse
We compare Scan results to Google Lighthouse:
- Run Scan on 50 websites
- Run Lighthouse on same 50 websites
- Compare issue counts
Latest results:
- Technical SEO issues: 94% agreement (Scan finds 94% of what Lighthouse finds, plus extras)
- Performance issues: 91% agreement
- Accessibility issues: 96% agreement (Scan uses pa11y, more thorough with 150+ checks than Lighthouse)
Method 2: Manual developer review
We hire developers to manually audit 10 sites:
- Scan generates report
- Developer manually checks every issue
- Calculate true positive rate
Latest results:
- 98% true positive rate (issues Scan flags are real)
- 2% false negative rate (issues Scan misses)
Conclusion: Scan accurately identifies technical issues with 98% detection rate.
Solutions Accuracy
How Solutions Analysis Works
6-AI adversarial debate (not single perspective):
What Solutions does:
- Submit strategic question to 6 specialized AI models
- Each model analyzes independently (CFO, COO, Market Realist, Game Theorist, Chief Strategist, Wildcard)
- Models debate (challenge each other’s assumptions)
- Synthesis engine combines perspectives into recommendations
Not single AI opinion:
- Solutions doesn’t just ask ChatGPT for advice
- 6 different models with different training data and perspectives
- Adversarial debate surfaces blind spots
Solutions Accuracy Metrics
Validation against human consultants: 85% agreement
Study methodology (Sept 2024):
- Collected 50 real business challenges from customers
- Ran Solutions on all 50
- Hired 3 MBA consultants to independently analyze same 50 challenges
- Compared Solutions recommendations vs consultant recommendations
Results:
- 85% substantial agreement (Solutions and consultants reached similar conclusions)
- 12% partial agreement (Solutions identified same issues but different priorities)
- 3% disagreement (Solutions missed something consultants caught, or vice versa)
What “substantial agreement” means:
- Example: Solutions says “Don’t launch freemium” (high churn risk)
- Consultant says “Freemium is risky due to support costs and churn”
- Same conclusion, different framing
What “partial agreement” means:
- Example: Solutions says “Hire AE before engineer” (revenue priority)
- Consultant says “Hire engineer before AE” (product quality priority)
- Both identify hiring need, differ on sequencing
Benchmark Accuracy (Industry Standards)
Solutions applies correct benchmarks:
Validation test (Oct 2024):
- Collected industry benchmark data (SaaS churn rates, conversion rates, CAC, LTV)
- Fed hypothetical scenarios to Solutions
- Checked if Solutions applied correct benchmarks
Results:
- 92% benchmark accuracy (Solutions cited correct industry standards)
- 8% outdated or overly generalized benchmarks
Example of correct benchmark:
- Scenario: “B2B SaaS targeting SMBs, freemium model”
- Solutions: “Industry benchmark: 2-4% free → paid conversion, 8% monthly churn”
- Actual industry data: 2.5% conversion, 7.9% churn (Solutions within range)
Example of outdated benchmark:
- Scenario: “AI-powered SaaS pricing”
- Solutions: “SaaS pricing typically $50-200/month”
- Actual: AI tools price $20-100/month (lower than traditional SaaS)
- Solutions benchmark slightly high (based on 2022 data, not 2024)
Overall: Solutions is accurate on benchmarks 92% of time, occasionally lags on emerging categories.
Accuracy Limitations
What Can Affect Accuracy
Signal limitations:
- AI platforms change recommendation algorithms (accuracy decreases temporarily until we adjust)
- Very new businesses (under 30 days online) have limited data
- Very niche categories (fewer than 10 competitors) have smaller sample sizes
Scan limitations:
- JavaScript-heavy sites may have issues Scan doesn’t detect (requires user interaction)
- Paywalled content (Scan can’t test pages behind login)
- Dynamic content (AJAX-loaded pages tested incompletely)
Solutions limitations:
- Industry benchmarks lag 6-12 months (AI trained on older data)
- Company-specific context missing (AI doesn’t know your team, culture, constraints)
- Novel business models (no industry data for comparison)
How We Improve Accuracy
Signal improvements:
- Monthly re-calibration against manual tests
- Platform-specific tuning (adjusting for each AI’s response patterns)
- Expanding query sets (50+ persona queries for statistical confidence)
Scan improvements:
- Adding checks quarterly (accessibility, security, performance)
- Cross-validating with Lighthouse, pa11y, WebPageTest
- Real device testing (iOS, Android updates)
Solutions improvements:
- Updating industry benchmarks quarterly
- Training models on recent data
- Incorporating user feedback (when recommendations were wrong, why?)
The Bottom Line
Surmado reports are accurate because they test real systems in real-time:
Signal: ±3% statistical margin (95% confidence)
- Tests actual AI platforms via API
- Fresh responses (not cached)
- Validated against manual testing (92% match rate)
Scan: 98% issue detection rate
- Real checks on live website
- Actual Core Web Vitals measurements
- Validated against Lighthouse (94% agreement)
Solutions: 85% agreement with human consultants
- 6-AI adversarial debate (not single opinion)
- Correct industry benchmarks 92% of time
- Validated against MBA consultant analysis
No tool is 100% accurate. But Surmado’s accuracy is high enough to make confident business decisions.
Frequently Asked Questions
Why isn’t Signal 100% accurate?
Because AI platforms aren’t 100% deterministic.
AI platforms like ChatGPT have built-in randomness (temperature setting). Ask same question twice, get slightly different answers.
Example:
- Query 1: “Best HVAC companies in Phoenix”
- Response 1: Mentions Competitor A, B, C, D, E
- Query 2 (same question, 5 minutes later): Mentions Competitor A, B, C, F, G
D and E dropped, F and G added. That’s normal AI behavior, not Signal error.
Signal accounts for this by testing 50+ queries (statistical sample size smooths out variance).
Can I verify Signal results myself?
Yes. You can manually test:
Process:
- Take persona queries from Signal report
- Submit to ChatGPT, Claude, Perplexity yourself
- Compare responses to Signal findings
Expected result: 90-95% match (Signal and manual results similar, not identical due to AI variance).
Common finding: “Signal said ChatGPT mentioned me 8 of 10 times. I tested 10 times manually, got mentioned 7 times. Close enough.”
Why isn’t Scan 100% accurate?
Because some issues require human judgment.
Example: Content quality
- Scan flags: “This page has only 250 words (thin content)”
- Human judgment: “But it’s a landing page with video. 250 words is fine.”
- Scan can’t evaluate context (just flags word count)
Another example: Alt text quality
- Scan flags: “Image missing alt text”
- Reality: Alt text exists but is poor quality (“image1.jpg”)
- Scan detects presence/absence, not quality
Trade-off: 100% automation = some false positives/negatives. Manual review adds context.
Does Solutions give better advice than human consultants?
Different, not better.
Solutions strengths:
- Fast (15 minutes vs 2-week consultant engagement)
- Cheap ($50 vs $5K-15K consultant)
- Objective (no ego, politics, biases)
- Covers 6 perspectives (CFO, COO, Market Realist, etc.)
Human consultant strengths:
- Company-specific context (knows your team, culture, history)
- Relationship (ongoing advice, not one-time report)
- Novel situations (can handle unique edge cases)
- Execution support (implements recommendations with you)
Best use: Solutions for quick validation. Consultants for deep engagements.
What happens if Signal results seem wrong?
Contact us: hi@surmado.com with:
- Intelligence Token (so we can review)
- What seems wrong (expected X, got Y)
- Manual verification results (if you tested yourself)
We’ll investigate:
- Re-run queries manually
- Check for platform changes (AI updated algorithm)
- Validate against fresh data
If actual error: Refund + corrected report If variance within normal range: Explanation of why results differ
Do Scan results match Google Search Console?
Mostly, but not always.
Why differences:
Google Search Console (GSC):
- Shows Google’s view of your site
- Data may lag 3-7 days
- Focuses on Google-specific issues
Scan:
- Tests site right now (real-time)
- Broader scope (accessibility, security, not just SEO)
- Cross-browser testing (not just Google)
Example difference:
- GSC: “12 pages not indexed”
- Scan: “15 pages not in sitemap”
- Difference: Scan found 3 more pages GSC hasn’t discovered yet
Use both: GSC for Google’s perspective, Scan for audit covering performance, accessibility, SEO, security, and mobile.
Are Solutions recommendations guaranteed to work?
No. Solutions provides analysis, not guarantees.
Why:
- Execution matters (best strategy fails if poorly implemented)
- Market changes (recommendations based on current data)
- Unknown factors (AI doesn’t know your team skills, cash position, etc.)
Solutions tells you: “Based on industry benchmarks, here’s the likely outcome” Reality: Your execution, timing, and luck affect actual outcome
Use Solutions to de-risk decisions (better than guessing), but don’t expect 100% success rate.
Questions about accuracy? Email hi@surmado.com with your Intelligence Token and we’ll review your specific report.
Was this helpful?
Thanks for your feedback!
Have suggestions for improvement?
Tell us moreHelp Us Improve This Article
Know a better way to explain this? Have a real-world example or tip to share?
Contribute and earn credits:
- Submit: Get $25 credit (Signal, Scan, or Solutions)
- If accepted: Get an additional $25 credit ($50 total)
- Plus: Byline credit on this article