The Estate Planning Face-Off: What We Learned Running Generic AI and Luminary Side by Side

We use ChatGPT and Claude every day at Luminary, and we're fans. We also built a purpose-built platform for estate planning, because in this profession “really good” is not good enough. Last week we ran an honest, head-to-head comparison between the generic tools and Luminary on the same estate documents. Here is what we found, and how to think about which tool belongs where in your practice.
Horizontal Power, Vertical Stakes
Software falls into two broad camps. Horizontal tools are built for breadth, like a CRM that serves an HVAC company and a fashion brand equally well. Vertical tools are built for industries defined by deep nuance, high stakes, and heavy regulation, such as healthcare, finance, and education.
ChatGPT, Claude, and Gemini are horizontal by design. They are extraordinary raw intelligence, but like crude oil, raw intelligence has to be refined before it becomes the gasoline that runs an advisory practice. That refinement happens through choosing the right model, prompting, structured data, validation, and human review.
The question for any advisor or attorney is whether they want to be the refinery, or use a platform that has already built one.
Where Generic AI Holds Up
For our face-off, we ran the same sample documents through ChatGPT, Claude, and Luminary, and selected the strongest output from each generic tool. On simple document summaries, ChatGPT performed well. The summaries were complete and well-structured, grouping concepts sensibly around incapacity, death, distributions, and fiduciaries.
The trade-off: clicking a citation downloaded the entire trust document. For anyone who has not already read the trust front-to-back, "the answer is somewhere in this 60-page PDF" doesn’t really verify anything. That’s an acceptable compromise for low-stakes work, but for estate planning, it’s where the cracks start to show.
Where It Started to Break
Things got more interesting when we asked Claude (using its Cowork feature) to build a full client-ready presentation complete with a balance sheet, waterfall, and tax projection from the same household. We ran the exact same prompts, against the exact same documents, twice. We got two entirely different presentations.
One was attractive, but it missed a critical update: the sunset rollback. It still treated the federal exemption as roughly $7M per person, because that is what its training data remembered.
Every AI model is trained on a fixed corpus of data that ends at a specific date the model provider publishes. After that point, the model has no native awareness of new tax law, IRS guidance, court rulings, or any other changes in the world unless it reaches for an external tool like web search mid-response.
Claude Opus 4.6, the model behind Claude Cowork today, has a training data cutoff of August 2025. Anything that became law or guidance after that date sits outside what the model natively knows. That is why advisors can find the same model returning current numbers one minute and outdated ones the next, depending on whether it decides to look something up.
That is exactly how a polished, confident-looking presentation ends up with an exemption number that is wrong by years. There is no faster way to lose credibility with a client.
The second run caught the sunset, but produced a different estate tax number. Both runs landed on totals of $18.3M and $18.2M respectively, neither of which we could reconcile against the correct figure of $17.3M, even after considerable time spent trying to retrace the math.
The problem is not that the models are bad. It is that on a one-shot deliverable, the output is plausibly correct, aesthetically polished, and inscrutable. That combination is dangerous in a profession where the plan kicks in on the worst day of a family's life.
How Luminary Refines AI Into Something You Can Stake Your Practice On
Luminary is the refinery. Our knowledge graph is the data model underneath the platform, an extraction process that reads every document in a household, understands how a codicil amends a will or how an amendment ties back to a revocable trust, and builds a structured, version-controlled estate from those relationships. Assets, fiduciaries, sub-trusts, and conditions are all codified and connected.
On top of that data, a few things consistently separate Luminary from a one-shot prompt:
- Extract, critique, refine. Every piece of extracted data is reviewed by multiple models checking for both accuracy and completeness, sometimes up to five times. This is how our core data extraction reaches an F1 (accuracy) score of around 97%, compared to roughly 87% when we turn that loop off.
- Citations that actually verify. Every summary bullet, every balance sheet line, and every fiduciary role links directly to the paragraph in the underlying document where it came from. With one click and low friction, you get real auditability.
- Editable, auditable AI output. Summaries come in both long-form and client-facing versions. Memos generate cover letters, executive summaries, document comparisons, and fiduciary overviews from the same trusted data. Diagrams surface nuances like disclaimer trusts that the generic outputs missed entirely.
- Purpose-built tax engine. Quantify state and federal estate taxes alongside wealth transfer outcomes so advisors can model trade-offs and clients can make decisions grounded in real numbers.
- Repeatable templates. Reports run from firm-branded templates so your colors, disclosures, and structure stay consistent across every household and every advisor on your team.
Set up a demo to see for yourself
Vertical or Horizontal? Three Questions to Ask Yourself
If you are weighing where generic AI fits in your practice and where it does not, three questions cut through the noise:
- What happens if the AI is wrong, and how would you know if it is right?
- Is this a one-time task, or part of an ongoing, repeatable workflow?
- Does the information need to be shared across colleagues, third parties, or auditors?
Three Zones, One Clear Picture
The honest answer is that AI lives on a spectrum. We think about it in three zones, each consecutively increasing in complexity and risk:
- Generic AI sweet spot. Drafting emails, brainstorming, general Q&A. Low nuance, low risk, low consequence if a sentence comes back imperfect. Use the off-the-shelf tools and move on.
- Augmented AI zone. Summarizing legal documents, reviewing statements, basic one-time deliverable creation. Generic AI gets you most of the way, and a human still has to refine and verify the output.
- Vertical AI required. Estate planning, trust structuring, wealth transfer strategies. High stakes, high nuance, real downstream impact on real families. This is where domain expertise, structured workflows, and auditability are not optional, and where purpose-built platforms earn their place.
Generic AI will keep getting better, and a rising tide lifts all ships. Because Luminary sits on top of those same models, our platform gets better right alongside them, with the added benefit of the structured data, validation loops, and governance controls that legacy planning actually requires.
Schedule a demo to see what purpose-built AI looks like on your own client documents: withluminary.com/contact.