Email experiment gone wrong: the four-lens framework for measuring campaign success

At Unspam 2026, Maggie Glascott ran an experiment that hit every KPI. A parent called it bullying. Dashboards had missed it entirely.

Author

RGE Team

May 4, 2026

Table of contents

Heading 2

Heading 3

0:00

You know that feeling when a campaign hits its KPIs, the dashboard is all green, your boss is happy, and then someone sends you a screenshot of a parent calling your email "plain old bullying" in front of 1,700 likes?

Yeah. That happened.

Maggie Glascott, Senior Lifecycle Marketing Manager at Buffer, opened her Unspam 2026 session with that story. A decade in email, most of it as a team of one. (Any lone email wolves in the room? She asked. A lot of hands went up.) No one to bounce ideas off of. No one to say "hey, maybe reconsider that subject line." Just her, a dashboard full of green, and an owl with a bullying problem.

Here is what happened, why the metrics never saw it coming, and the framework she uses now.

The email experiment that worked (until it didn't)

Glasco was the first and only dedicated email hire at Duolingo. Hundreds of millions of users. A notoriously sassy owl mascot. And a homegrown email platform where even a subject line test required engineering resources. (Yes, really. The crowd at Unspam reacted accordingly.)

When experiments cost engineering time, neutral results are not just disappointing. They are expensive. Glasco was coming off an unsuccessful experiment and was, in her words, desperate for a win. The metric she needed to move: DAU. Daily active users. So she looked at what was already working. Duolingo's algorithm ranked push notification templates by their effectiveness at getting users to complete a lesson. One ranked consistently at the top: a day-seven passive-aggressive reminder that called out users for going a whole week without practicing. Glasco's plan was to replicate that for email and take it further.

She designed a series of escalating messages targeting at-risk weekly active users. Each day of inactivity, the tone got a little more unhinged, matching the chaotic energy Duo the Owl had been building on social. One subject line gave her pause when she wrote it.

She sent it anyway. This is the part, she said, where having someone to bounce ideas off of would have mattered. Working alone means no one asks the uncomfortable question before launch. The only check is your own gut. And when you are desperate for a win, the gut gets overruled.

DAU climbed. The algorithm ranked her new template the highest of the entire set. Everything was green. Then a colleague flagged a post from an angry parent. Their 9-year-old had received that subject line. The post called the email "a new low" and "plain old bullying." 1,700 likes. 200 comments. 60+ reposts.

What the dashboard could not see

Dashboards are very good at telling you what people did. They are completely useless at telling you how something made people feel, or what people did about those feelings outside your product. Like posting about it. The aggregate numbers were still up. By every quantitative measure, the experiment was a success. But Glasco had missed two things entirely. The first was audience definition. She was targeting at-risk weekly active users. Age had not crossed her mind. She was not thinking about the segment within the segment. Duolingo attracts many younger learners. They were on the list.

The second was channel context. Duolingo's social presence is deliberately unhinged and it has driven real user acquisition. But email is not social. Users did not sign up to receive TikTok energy when they handed over their email address. The inbox is more intimate. People expect to be spoken to differently in it, even if they find the owl twerking hilarious. Glasco had blurred that line, and it did not land. The downstream risk made it worse. That post reached potential and lapsed users with zero context about the experiment or the brand love. 1,700 reactions. Hard to measure, harder to undo.

If this sounds familiar, it is because it is a structural tension every email marketer lives with. You are on the front lines of customer feedback. But defending qualitative concerns is genuinely hard when leadership is looking at a metric that is moving in the right direction. Dashboards have more authority in a meeting than your gut. The only way to compete with that is to give qualitative concerns a structure.

That is what the framework does.

The four-lens framework for email campaign measurement

The problem, Glasco argued, is structural. You have North Star metrics you are constantly chasing. But not everyone on your team is weighing the risks of a growth-at-all-costs mindset against the long-term effects of short-term gains. That tension lands on email marketers. You are the ones who have to keep a finger on the pulse of user sentiment, whether you like it or not. So she built a framework. Not a hack, she was clear about that. Four lenses to run every experiment through before calling it a win or a loss. Same experiment, different light. Looks different every time.

Lens 1: Performance

The one we all already use. Did it lift the metric, by how much, and for whom? (Yes, it is "whom." She looked it up.)

Performance is the starting point, not the finish line.

Monitor by cohort from the start, not after something goes wrong. If you only look at aggregate lift, you flatten the insight. Her own experiment proves it. Aggregate DAU was up. But that day-seven passive aggressive template, the highest performer overall, was actually low-performing for the Vietnamese language UI. Hot tip if you work with localization: look at email performance by key markets. Tone does not always translate.

Lens 2: Segment

Humbling by design. It requires admitting your target audience is probably not as well-defined as you think.

The question is not just "who am I trying to reach?" It is "who is actually going to receive this?" Those are not the same list. Glasco was targeting at-risk weekly active users. She was not thinking about children. They were there.

Harm is not only a viral post. Unsubscribes, support tickets, a gut feeling that the copy went a little too far. All of it is worth tracking.

Lens 3: Brand

One gut check: would we send this if it only moved the metric by half?

If the answer is no, sit with that.

Email is an extension of the in-product experience, not an extension of your social channels. Your users did not follow you on TikTok when they gave you their email address. The inbox is more intimate. Someone can love the owl twerking and still expect something different in their inbox. Duolingo's social presence had earned the chaos. The inbox had not.

Lens 4: Long-term

No column for this one in any dashboard. Glasco argued it is probably the most important lens anyway.

Her question: if this were the only exposure someone would ever have to our brand, would we be okay with that?

Silent churn is real. People who do not complain, do not post, do not even unsubscribe. They just quietly stop, or quietly clock you as a brand they want nothing to do with. You will never attribute it to a single email. But the cumulative effect of borrowing from trust shows up eventually. That Duolingo post reached potential and lapsed users with no context, no nuance, no brand love. Just the owl, calling a child a quitter.

The "what's in it for you": no single lens gives you the full picture. The goal is not a perfect score on all four. It is a more defensible case for what you call a win, and a cleaner argument for when to pull something even when the numbers say not to.

Setting guardrails before you test

The framework only works before launch. Not after the post goes viral.

Use your brand's mission statement as a filter. Under pressure to ship quickly, it is the most objective filter you have. Duo's mission is to create the best education in the world and make it universally available. Lightly nagging a 9-year-old is not doing that.
Define your floor before you launch. What would make you pull this, regardless of what the numbers show? For tone-based experiments, agree in advance on what triggers a review. One press inquiry? Ten social mentions? Write it down before you are staring at green metrics and someone is asking why you would possibly want to stop.
Weigh trade-offs explicitly. Leadership will default to the visible metric. Your job is to make the invisible cost visible. Quantify what you can: social mentions, support ticket volume, reach of a negative post. It is not as clean as DAU. It is still something on the scale.
Trust your gut. Glasco knew that the subject line was pushing it when she wrote it. She even tried to compensate with apologetic body copy. Then acknowledged out loud that this was a fairly stupid plan, since you have to open the email to read it. She overrode her instinct because the numbers were good and she wanted them to stay good.

You are the expert in your channel. If something feels off, it is worth voicing, even without numbers to back it up. Especially without numbers behind it. And as Glasco noted, riffing off a conversation with another Unspam speaker: AI might generate the unhinged copy. You are the one who decides it goes too far. That judgment is yours. Do not outsource it.

And, if you want a place to put that judgment to work, Beefree is where email teams build without the shortcuts.

What changed after

Three things, concretely. Glasco age-gated the entire experiment. The edgier set of practice reminders was no longer eligible to reach anyone under 13. She retired the specific template entirely. And she set a tonal ceiling for all email and push going forward: if that subject line was a ten on the spicy scale, everything was capped at a seven.

That guardrail did not come from a policy document. It came from learning the hard way. Her last point on feedback: not all of it is equal, but all of it is worth a look. One complaint is a data point. A post with 1,700 reactions from people who may never have used your product is a signal.

The dashboard will never tell you which is which. That is not a gap you can automate around. It is a judgment call, and it is yours. TikTok energy does not belong in the inbox. Knowing that, and protecting it, is the job.

‍

Author

RGE Team

Articles to whet your whistle

May 4, 2026

Email experiment gone wrong: the four-lens framework for measuring campaign success

At Unspam 2026, Maggie Glascott ran an experiment that hit every KPI. A parent called it bullying. Dashboards had missed it entirely.

RGE Team

May 4, 2026

The responsibility of reach: Cher Fuller at Unspam 2026

At Unspam 2026, Cher Fuller made one thing clear: having access to someone's inbox is not the same as having permission to use it.

RGE Team

May 4, 2026

Unspam 2026 Recap

AI, email relevance, and trust are reshaping deliverability and engagement. Here are ideas to take from Unspam 2026

RGE Team

Subscribe to our newsletter.

Dive into the world of unmatched copywriting mastery, handpicked articles, and insider tips & tricks that elevate your writing game. Subscribe now for your weekly dose of inspiration and expertise.

Subscribe Now

Curated Weekly Reads

Receive the crème de la crème of email designs and thought-provoking articles directly to your inbox, twice every week.

100% Quality, No Spam

We promise to only deliver value-packed emails, no fluff or spam. Your trust and satisfaction are our top priorities.

RGE Team