Methodology · How we score
Every vendor on CallTreo is scored against the same nine-dimension weighted rubric. This page documents exactly how that score is computed, what we refuse to score on, and how often we update. If you find a vendor we've scored wrong — email hello@calltreo.com and we'll re-review.
The rubric
Each dimension is scored 0–10 by an editor; the overall score is the weighted average. The weights below are what live in our scoring code today — not aspirational, not a future plan.
| Dimension | Weight | What it means |
|---|---|---|
| Call handling | 20% | Conversation quality, routing reliability, edge-case handling. Does the AI gracefully handle off-script inputs, accents, and noisy lines? Does it sound natural or robotic? This is the largest single weight because it's the most common reason a deployment fails post-launch. |
| Integrations | 15% | Native CRM and calendar fit, API flexibility, webhook robustness. We weight native > webhook > Zapier > none. Specific integration depth gets evaluated on the relevant integration pages. |
| Automation | 15% | Booking, qualification, follow-up triggers. Can the AI close the loop without a human? We score actual workflow completion, not feature checkboxes. |
| Ease of setup | 10% | Time to launch, technical complexity, ongoing tuning effort. A vendor that takes 6 weeks to launch is scored lower than one that ships in 1 — even if the 6-week vendor is more flexible. |
| Pricing value | 10% | Not lowest price — best value for the segment. Predictable pricing scores higher than opaque pricing. Hidden setup fees and aggressive overage rates get penalized. |
| Vertical fit | 10% | Templates and signals for specific industries (legal, medical, home services). A vendor optimized for SaaS sales won't fit an HVAC dispatch use case; we score against the segment, not the vendor's marketing. |
| Human handoff | 10% | Quality of escalation: live transfer, callback workflow, urgent routing. Vendors that hand off cleanly to your team beat vendors that pretend AI can handle everything. |
| Reporting | 5% | Call summaries, transcripts, dashboards. Useful for both ops and rep coaching. Lower weight because it's table-stakes for most vendors at this point. |
| Support | 5% | Onboarding quality, response time, ongoing partnership. Important when something breaks; less differentiated than the rest of the rubric in steady-state. |
| Total | 100% | Maps to a 0–10 overall score. |
What we refuse to do
We don't score on logo prestige.
A vendor's Series-C funding round doesn't make their AI better. We've kept under-the-radar vendors high in rankings when the product warrants it, and dropped well-funded names when it didn't.
We don't fabricate what we can't verify.
Pricing, HIPAA status, specific integration support — we render "On request" or "—" when we can't confirm something publicly. We'd rather show a gap than invent a fact.
We don't let referral fees move ranks.
Some vendors pay us a referral fee. None of them have moved a rank because of it. The rubric is the rubric. See our disclosure for full detail.
We don't write sponsored reviews.
No vendor pays for placement in editor verdicts or comparison pages. Pricing-page snapshots, integration pages, and vendor reviews are all editorially independent.
How we gather data
When sources conflict, hands-on testing beats reviews beats docs beats marketing. Anything we can't verify is rendered as a gap, not invented.
Feature claims, integration lists, and posted pricing. Treated as marketing material — we verify when we can.
Where possible, we sit through a real demo and test the integration the vendor's claiming. Live tests carry the most weight in our scoring.
G2, Capterra, Reddit, industry-specific forums. Useful for spotting patterns; we don't take any single review as gospel.
For the integration pages (HubSpot, Salesforce, Calendly, etc.), we test the integration where we have access — and clearly flag where we're relying on the vendor's documentation.
How we update
Quarterly editor pass
Every published vendor gets re-reviewed at least once per quarter — scores updated, new product changes incorporated, deprecated features dropped.
Ad-hoc updates on major changes
Vendor ships a major feature, pricing change, or new integration → we update sooner. Same goes for de-listing if a vendor folds or pivots away.
Versioned rubric
We've kept the 9-dimension weighting stable since launch. If we materially change the weights, we'll publish a versioned rubric note explaining why and how scores shift.
Editorial
Scores and editor verdicts on this site are written by humans who've been on the buying side of receptionist software. We push every page through a buyer-perspective read before publish — asking "is this what we'd want to know on the demo call?" If a page fails that bar, it doesn't ship.
See something wrong? Email hello@calltreo.com. Vendor corrections, factual disputes, or coverage requests are all welcome — we'll respond and update where the correction is valid.