Self-Improving GTM Systems
The complete architecture for building auto-research loops that make your campaigns, copy, and targeting smarter every single week.
The Auto-Research Pattern
One loop. Any business metric. Infinite iteration.
Karpathy built auto-research to improve AI models overnight. The same pattern works for anything you can measure: booked calls, conversions, engagement, revenue.
The brilliance isn't the loop itself. Marketers have been A/B testing for decades. The brilliance is that AI can now run the loop autonomously, log what it learns, and make each iteration smarter than the last.
But there's a catch. And it's the thing that separates systems that actually compound from systems that waste compute.
The Metric Problem
If you optimize the wrong thing, you build a machine that gets better at being wrong. Faster.
Example: Cold Email Metrics
Same campaign. Two completely different stories. A system optimizing for reply rate will write increasingly provocative subject lines that generate reactions. A system optimizing for booking rate will write emails that resonate with people who are ready to buy.
The Metric Cheat Sheet
| Domain | Vanity Metric | Objective Metric |
|---|---|---|
| Cold Email | Reply rate | Contacts per booked call |
| Landing Pages | Page views | Conversion rate |
| YouTube | Views | Average view duration |
| Newsletters | Open rate | Revenue per subscriber |
| Ads | CTR | Cost per acquisition |
| Sales Calls | Meetings booked | Meeting-to-close rate |
Loop 1: The Research Process
Every campaign ends smarter than it started. The next one starts where the last one left off.
Most people treat market research as a one-time activity. Pull some data, build a list, write emails, launch. If it doesn't work, blame the copy.
The research loop treats every campaign as a controlled experiment that produces validated knowledge. Not just metrics. Knowledge.
The Cross-Reference Analysis
After a campaign runs, pull every interested reply and cross-reference across six dimensions:
This produces patterns invisible to single-dimension analysis: “SaaS companies in North America, 50-200 employees, respond to ROI messaging at Step 1 with HOT intensity.”
That's not a guess. That's a validated finding. And it changes everything about the next campaign.
Finding Classification
Not all findings are equal. Without classification, you can't tell signal from noise.
replies = emailbison.list_replies(campaign_id, status="interested")
# 2. Cross-reference across 6 dimensions
matrix = cross_reference(replies, dims=["geo", "size", "messaging", "step", "intensity", "segment"])
# 3. Classify findings by evidence strength
for pattern in matrix.significant_patterns():
if pattern.booked_calls >= 2:
pattern.status = "VALIDATED"
elif pattern.booked_calls == 1:
pattern.status = "EMERGENT"
# 4. Propagate validated findings to system
update_master_file(client, validated_findings)
update_icp_definition(client, validated_findings)
# 5. Next campaign starts from updated knowledge
Loop 2: Email Copy Optimization
AI writes the challenger. Data picks the winner. Learnings compound.
This is the loop most people think of first. But there's a difference between basic A/B testing and a self-improving copy system.
You write two emails. Send to same list. Check which got more opens. Repeat manually. Learnings live in your head.
AI writes challengers. Measures reply rate. Keeps winner. Better, but still optimizing the wrong metric with no knowledge accumulation.
AI writes challengers from a growing learnings file. Measures contacts per booked call. Findings get classified. Patterns get codified.
The Copy Optimization Loop
baseline = emailbison.campaign_stats(campaign_id)
# Step 2: Extract winning copy patterns
winning_copy = emailbison.view_sequence_steps(campaign_id)
# Step 3: Load the learnings file (grows every cycle)
learnings = load_learnings("learnings.md")
# Cycle 1: empty
# Cycle 10: 15 validated patterns
# Cycle 50: a complete playbook
# Step 4: Generate challenger based on learnings + baseline
challenger = ai.generate_variant(
baseline=winning_copy,
learnings=learnings,
metric="contacts_per_booked_call",
constraints=copy_skill
# brand voice, word count, formatting rules
)
# Step 5: Deploy as A/B variant
emailbison.add_sequence_step(
campaign_id,
variant_of=baseline_step_id,
email_subject=challenger.subject,
email_body=challenger.body
)
# Step 6: After test period, log results
results = emailbison.campaign_split_test_stats(campaign_id)
append_learnings("learnings.md", results)
What the Learnings File Looks Like After 20 Experiments
VALIDATED: Emails under 60 words book 1.4x more calls in SaaS
VALIDATED: Questions in subject lines reduce C-suite booking rate by 23%
VALIDATED: Step 2 follow-ups citing a specific metric from Step 1 increase bookings 40%
EMERGENT: “Specifically” line after company reference increases relevance
EMERGENT: Ghost pipeline angle outperforms ROI angle for PE-backed companies
RAW: Hypothesis: Including competitor name may increase booking rate for displacement campaigns
These aren't opinions. They're findings from controlled experiments. By Cycle 50, this file is a playbook no human could have compiled manually.
Loop 3: Reply Intelligence
Everyone optimizes what they send. Almost nobody optimizes what they learn from what comes back.
When someone replies to your cold email, that reply is training data. But most people just read it, respond if positive, and move on. The reply disappears into the inbox.
What Replies Actually Tell You
They're in-market. You know their current vendor. You know their language. Next campaign: build a displacement angle for Competitor X users specifically.
You're hitting the wrong title. The system now knows. Next campaign: adjust the title targeting. Also, you just got a warm referral.
Timing signal. They're interested but blocked. Next action: add to Q2 nurture sequence. Don't waste a slot on the next campaign.
The 8-Phase Reply Analysis Pipeline
Reply intelligence feeds directly back into the research loop. Objection patterns reveal targeting gaps. Pain point language gets recycled into copy. Persona conversion rates sharpen the ICP. Every reply makes the next campaign smarter.
The Compounding Flywheel
Three loops. Each one makes the other two better.
Research → Copy
Better targeting means the copy optimizer is testing against the right audience. Experiments produce cleaner signal.
Copy → Replies
Better copy produces more replies. More replies means more training data for the reply intelligence system.
Replies → Research
Reply intelligence reveals targeting gaps, competitive intel, and persona patterns that refine the research loop's inputs.
Run this for a month and your campaigns are noticeably better. Run it for a quarter and you're operating on a level that teams resetting to zero every campaign simply cannot match.
Beyond Cold Email
The pattern works anywhere you have a clear metric and a way to test.
Analyze every video against viral benchmarks. Score titles, hooks, structure. Generate ideas from what works.
Auto-modify headlines, CTAs, social proof. Test against baseline. Page improves weekly without you touching it.
Generate variations. Test across audiences. Learnings from cold email copy often transfer directly to ad copy.
Two subjects per send. Log winner. After 20 sends, you have a validated playbook for your audience.
Test anchoring, tier structure, guarantee framing. Small changes here compound into serious revenue differences.
Test discovery frameworks, objection handling, close sequences. Each call produces learnings for the next.
What You Need to Build This
Three non-negotiable requirements.
Want This Built for Your Business?
We build self-improving GTM systems for B2B companies. Research loops, copy optimization, reply intelligence. The whole flywheel.
Talk to LeadGrow →