Welcome to the subclub podcast. A show dedicated to the best practices for building and growing app businesses. We sit down with the entrepreneurs, investors and builders behind the most successful apps in the world to learn from their successes and failures. Subclub is brought to you by RevenueCat. Thousands of the world's best apps trust RevenueCat to power in app purchases, manage customers and grow revenue across iOS, Android and the web. You can learn more at RevenueCat.com. Let's get into the show.
Hello, I'm your host David Barnard, and with me today, RevenueCat CEO Jacob Heide. Our guest today is Osmond Monser, a product manager on the retention team at Duolingo. On the podcast, we talk with Osmond about Duolingo's culture of experimentation, data and testing as a mode, and why passive aggressive push notifications can actually work in the right context.
大家好,我是你们的主持David Barnard,今天和我一起的是RevenueCat的CEO Jacob Heide。我们的嘉宾是Osmond Monser,他是Duolingo留存团队的产品经理。在这期播客中,我们和Osmond聊了Duolingo的实验文化、数据和测试作为一种模式,以及为什么在合适的情境中,被动式的推送通知可以实际发挥作用。
Hey, Osmond, thank you so much for joining us on the podcast today.
嗨,奥斯蒙德,非常感谢您今天加入我们的播客节目。意思是感谢奥斯蒙德参加播客节目。
David, thanks so much for having me. It's great to be here. Super excited to chat with you, Jacob. Good to see you. Happy Friday. I'm here. I'm ready to talk about subscriptions. Let's do it.
Hey, so when it kicked things off, we've got a lot of really cool testing that Duolingo has done. Your blog is amazing, by the way. I hadn't seen it until recently. Tweets, dorms, blogs, there's so many, we'll link to all these in the show notes. But so we'll talk about the individual tests in a minute, because I think there's some really great learnings. But I wanted to kick off just talking about how Duolingo thinks about testing and then kind of what the process is. So I kind of have a few topics I wanted to hit. But let's just start with ideation. Or I don't know if there's even maybe a broader overview you wanted to kick off with.
I can start with a deeper cut than that. As I saw that one of the blog posts, it was like, do Duolingo, we test everything? Why? Yeah, that's a great question. I think one thing that we keep in mind as a company is that data and the data that we collect and analyze and use to improve our product is one of our best modes. As a company, we have millions of learners, I think, now we're close to 60 million monthly active users. And that allows us to collect a lot of data that makes us get better at teaching languages, makes us get better at making our product more engaging.
And ever since this this founding of Duolingo, I only joined a couple of years ago. But since founding of Duolingo, AB testing and metrics driven product development has always been in our DNA. And it's worked really, really well. Even like 10 years after the launch, we're still maintaining like 50% year over year, year, year, year growth. So we just have a formula that works and it's something that we're committed to for the long run.
Yeah, and it's, I mean, testing is something that it's not always a silver bullet. Like sometimes the infrastructure you need and the effort it takes and sort of the expertise to not test poorly is also like not easy to come by. And so, yeah, I almost think that by making a bold statement like that, that testing is a part of our culture, you kind of have to do that because otherwise, if you're just like, oh, we're going to try to AB test some things, it's probably not going to go well. You have to make it a big deal.
And it sounds like it sounds like it sounds like that's what you all do. And the, the mode thing is also very interesting because, you know, for folks that think, you know, like as doling you have to be thinking like, why won't there be a next doling? Like why won't some things move faster than us? Be does because typically a smaller company can move faster in some aspects, right? They can experiment. They don't have anything to lose. Like they can try stuff. And so you think about defensibility in modes, right?
Exactly. And in consumer, it's, it's brand is a big moat, which dolingo has as well. But data, especially in consumer when you have so many users, is especially when you have that many users, there's something a small startup can't replicate, right? And they can't test right away. So it makes strategic sense, especially at scale.
Exactly. Yeah. Well, that's part of what I wanted to go through this process though, because while you can't test at scale as a small company, I think there's so much that every size company can learn from dolingo about the process and kind of the thinking behind it. And that's why I, and a great way to start, Jacob, thanks for the save there on asking the overview. Yeah, start with the why, you know, start with the why.
So now let's dive into the details because it, and this is where I do think a lot of ideas get short tripped in tech community. Ideas that are diamond dozen, the yadda yadda, but as part of a testing culture, it is super important to come up with the ideas to test. And whether you're a big company or a small company, whether you're testing just by releasing an update for a week, and then changing it a week later and looking at the results without any kind of sophisticated infrastructure, you still need to come up with ideas and try ideas and have bold ideas. So let's just start with that.
What's the ideation process like how do you come up with ideas to test and what does that look like internally? Yeah, it's a great question. I would say our ideas and our ideation process comes from a few different places. So much other consumer apps. We do a lot of UX research. We have a great design team. A lot of ideas can come from exploration from those functions, but also a lot of the ideas are driven by product managers from a bottom's up approach.
So I joined Dueling go right out of undergrad as an associate product manager. And even someone like myself at the junior level was empowered to sort of drive team roadmaps and own parts of the road map to ideate on features. So there's a lot of that just like pure ideation from the bottom level of the company all the way up to the executive level where the ideas are eventually proposed.
But more tactically, I think a lot of our ideation comes from the cycle of iterating and running experiments. So every time we run an A, B test, we will analyze the data and from that data, we'll learn like five or six different things that we can then turn around immediately and test again. So we keep a really tight loop of iterating and learning from experiments.
Another thing I think that's really unique to Dueling go is that it's an app that every employee can use. So we have a really strong dog food and culture where all of our 600 plus employees are using Dueling go. We have a big channel on our Slack called feedback product where anyone at the company can just dump in feedback on the product about anything related to the app. And as PMs we we watch that channel and whoever it is that's dumping and feedback into that channel. We can sort of use that to clean insights, get ideas. And yeah, I think that's really unique to Dueling go because a lot of other subscription apps, maybe not everyone employed at the company uses it regularly. But Dueling goes an app designed to be used daily. It's a daily, it's a daily app. So you can use it every day and get insights to iterate quickly.
我认为 Dueling go 独特的一点是,它是每位员工都可以使用的应用程序。我们有一种非常强烈的“狗粮文化”,有600多名员工都在使用 Dueling go。我们有一个在Slack上大型的称为“产品反馈”的频道,公司中的任何人都可以在其中倾泻与应用程序有关的任何反馈。作为产品经理,我们会关注这个频道,并收集任何有用的信息和想法。所以我认为这是 Dueling go 所独有的,因为其他订阅应用程序,可能不是公司中的每个员工都经常使用它。但 Dueling go 是一款被设计用于日常使用的应用程序。可以每天使用它并获得清晰的见解以快速迭代。
When you have an idea, you always go like we have to design a test around it. And I asked because some ideas, well, there's two cases there were as the visionary founder, I'm like, I don't need to test this. I know it's going to be great. So that's one. And then the second one being like sometimes these are hard to test, right? Like sometimes they don't have like, I mean, I guess you have your KPIs and things like this.
But do you do you have to be judicious and and then also you can't run infinite tests. Right. There's only so much bandwidth in a product and and new experience. Right. There's there's limitations to what you can test and synchronously or like together and things like this. So like how do you do always test everything before you hit go or or can you just use your judgment on what's an important test or not or how to set how do you set the bar?
Yeah, it's a great question. As crazy as it sounds, we basically do test everything. Only the like the time everything goes into a feature flag. So anything you're building gets flagged and then it just even if you don't expect like some groundbreaking result, you're going to run it on and off just to like at least measure baselines and stuff. Correct. Even if it's just like a backend change that doesn't change the UX experience at all. We'll just run it as an A B test just to make sure there's no regressions where it impacts something else in the app.
So literally every tiny change will go through as an A B test. That's just sort of how we do things. Interesting. And that has to take a lot of infrastructure. Right. Like not just computers, but the people to maintain those computers and like the people to maintain the testing stuff. And how does it as a PM. I assume you have like a team with like a mission thing. It's like how do you is that all clients and like are you guys basically clients to like all that infrastructure is easy to use. And what does that look like.
从字面上看,每个微小的变化都会进行 A B 测试。这就是我们做事情的方式。有趣。这需要很多基础设施,不仅是计算机,还有维护这些计算机的人员以及维护测试工具的人员。作为产品经理,你是如何处理这些任务的呢?我猜你有一个拥有任务的团队。这看起来像所有客户都是这样,你们基本上是客户,使用这些基础设施很容易。那看起来是什么样子的呢?
Yeah. So we all of our like A B testing infrastructure is built in house. We have a really strong data infrastructure and experimentation team that builds these tools that allows engineers to set up A B tests once they once they build a feature, for example. And within that same tool every morning, every day we generate reports on the new data that comes in from experiments and as a PM. That's like my first thing I do when I went up is check the experience dashboard see all the like right now my team. I think is running like probably 20 experiments and on different parts in the app.
So just monitoring those experiments is done with this in in house built tool. And yeah, generally in terms of the limitation of not running too many A B tests. The one principle we try to buy by is not running multiple tests on the same real estate at once. And if we do then we will look at the intersection of those two experiments to make sure that there's no weird effects with those experiences being combined.
The other thing I'll mention is that because we're both on iOS and Android oftentimes we will run different experiments on the same real estate on the different platforms with the eventual goal of aligning for platform parity depending on what wins. Interesting. How do you think about the funnel dynamics of confounding variables. So if 1 PM is testing an onboarding change that kind of shifts the focus this direction and your way deep in the retention funnel, you know, testing something that is a different direction.
How do you think about those early or middle or late tests kind of being confounding variables against each other. Yeah, because they're not same real estate necessarily right but they're the same real state in the sense that they're all in the app. You know, they're all part of the experience. Yeah, it's a great question. I think from a collaboration perspective, our product or the product team is very collaborative in terms of sharing what each team is working on what different PMs are working on in different real states.
So whenever a product change is approved by our product review council and like the CEO and others, the PM will send an email out to the product org and others can also be on that list of two. Just explaining where about to test this. This is what was approved. This was the conversation and the product review. So then there's at least awareness of oh, this is what's going on. And if a team is especially affected by a certain change, then there will be conversations even before that to make sure that the PMs and the teams are aligned. And then the same process sort of happens at the end of the experiment when you're about to launch it.
They send out an email with the data and the results saying, hey, this is what we learned. And that allows PMs on other teams working on different areas of the app to sort of look at, oh, hey, this might impact my team's features or this might impact my metrics. So there's sort of that collaborative style from the practical level. And then in terms of the data level, I think we also do a lot of cross analysis of experiments. So if we know another team is running an experiment that might impact one of our experiments, again, we'll look at the intersection of those experiments to see if there's any differences in the data. So yeah, we do make sure that we're not stepping on each other's toes or bumping elbows. But overall, we're able to do it pretty seamlessly, I'd say.
I guess like, and I think this is kind of the natural next question is like results and understanding that stuff like how, like how do you always make follow the data. Do you ever say like, oh, this hurt the data, but the visionary CEO wants this. So we're going to do it anyway. How does that so you get you get the results in there's a big critique out there. And I think you know, if we're trying to apply this to like the broad nut, but there's very few companies in the world like revenue cat certainly I've never worked to come is had an A B testing system so dialed that it sounds like as on rails and powerful is what you all have. It's a big investment to get there. But you know, even on the smaller scale, you're like constantly kind of trying to decide like, oh, like what can I test. I have like kind of resources. And then like sometimes yeah, you're just you don't necessarily want to. You don't want to know the results right. And then sometimes if you get a result, if it's not the result you want, you might just ignore it anyway. So I'm curious like, how do you allow. How do you I mean, there's a big critique of test driven design is like a failed, you know, there's a lot of like people that say like, oh, you're just going to optimize into some local minima or maximum right. How do you do you balance that with like PM intuition and feedback and quantum, quantum to take qualitative stuff as well or how does that. Yeah, how does that go.
Yeah, totally. And we're like totally aware as a company of like the shortfalls. There is with a very A B testing driven framework. So there is a lot of like qualitative input into decisions on calling an experiment. So even if an experiment is a winner for our top line metrics, we will still have a discussion around whether in the long term, this fits into our road map or whether it makes sense for the UX. So a very common situation for a test that wins at Duolingo is that if the feature or the change complicates the UX or makes the app just feel or look busier, then there's always a conversation of, oh, is the metrics win here worth making the app more complicated. And the word the words we hear the most from our CEO, Luis is only launch this if it's a big win. Whenever we propose something that complicates the like the UX, there's like a threshold that isn't always clear, but it comes about when we have a discussion of, oh, is this metrics one worth it. So there is that qualitative discussion that happens with leadership that happens within the product and design orgs.
Yeah. In general, I think we do a good job of making sure that we're not only following the numbers, but we're also data informed, right? Yeah, data. Data decided itself. How do you how do you all think about like significance, right? Because like, you know, you might you can run any test and even on, there's going to be noise, right? You might end up with, I mean, I guess maybe not at that scale right 60 million, MAU or whatever, DAU or whatever. So that's an amazing number you said that you get a lot of significance out of that, but I'm sure you still get to the point where our lot of results know like our lot of results like no no difference. Yeah, it's a great question. So and you're right that at our scale and the user base that we have a majority of the changes we can make. So significant, which is a really big advantage that we have, especially on teams like my team, the user retention team that works on features that impact basically the entire user base. We can roll out an experiment today and tomorrow morning will have statistically significant results on our top line metrics and maybe like day one retention or something or something. Yeah, exactly. Basically, we're always looking at DAU D1 retention things like that.
Generally, we make sure we run experiments for at least three or four weeks on average, just so we're rolling out novelty effects. So we're making sure that, yeah, there is like a strong significance there. Other teams at the company that are working on like more specific parts or perhaps even new initiatives. For example, we have a product called dualingo for schools, which is basically a dashboard for teachers to allow their students to use dualingo. And since they have a smaller user base of teachers, they need to do more calculations around, okay, how long do we need to run it? Can we actually achieve it within the before the heat death of the universe kind of exactly. Yeah, exactly. Yeah, but for most teams that are shipping features to millions of users and these users are using the product every day, it's really easy to get significant son or data.
通常情况下,我们确保对于实验我们至少进行3到4周的时间,以确保我们不会误认为新颖因素会持续下去。因此,我们确保确实存在强烈的统计学意义。我们公司的其他团队致力于更具体的部分或者全新倡议。例如,我们有一个叫做Dualingo for Schools的产品,它是一种供教师使用的仪表板,允许他们的学生使用Dualingo。由于教师这一用户群体较小,他们需要做更多的计算以确定我们需要多长时间才能够实现目标,以免超过宇宙热寿命限制。但对于大部分向数百万用户发布功能并且这些用户每天都在使用产品的团队来说,很容易得到有效的数据或证据。
How do you all think about look back? So you have a win, you roll it out. It's got statistical significance. It seems like it's going to be a great win. Do you revisit some of these results six months, two years or not two, I mean, two years is a really long time, but six months, nine months from the original test to see if long term retention was sacrifice for short term retention or, you know, monestization was sacrificed over the long haul for, you know, DAU or things like that. How do you think about those long term impacts? Yeah, that's a great question. I think we do two things.
One thing is for, especially for our bigger features will run holdout experiments. So basically after we launch an experiment will keep maybe one or two percent of users in the control group. And leave that running for like three or four months, something like that that allows us to get more insight on the long term impact of an experiment. This is especially useful for teams that want to get like assess the impact of changes that might not have impact on D1 retention immediately. For example, we have a team that works on social features, basically connecting learners with other learners on the platform. And sometimes adding those features doesn't immediately impact top line metrics like DAU. But in the long term, having a more social experience through like network effects that can impact our top line metrics. So I know that team they run a ton of holdout experiments on a lot of their features. And we do as well.
The other thing we do to make sure that we're also tracking the long term impact is that we will track the feature level metrics for a feature that we launch over the long term. So for every feature that we build will have a feature dashboard that tracks some of the more ad hoc metrics feature engagement level things that we might be interested in. Over time, we can see, oh, when we initially launched this feature engagement was really high, but it seems like it's dropping a little bit that sort of gives us a send indication that there might be some novelty effect at play. And yeah, that's sort of the two approaches we take to making sure that we're looking at data in the long term as well.
So I do want to move on to the kind of specific test that run, but one more shot at anything else you want to kind of summarize about the way you'll think about testing or maybe anything we left out that you think is particularly important about about how do you think about and actually does the testing. Yeah, I think the one last thing I'll mention is that because we collect so much data, it allows us to get really granular in the analysis that we do. So I'm sure we're about to talk about notifications a bit more, but one example I'll give is that notification copy obviously performs differently in different UI languages and geographies.
But because we collect so much data, we can sort of drill down and look at the click through rates on our notifications by language and that allows us to say, oh, in Romanian this notification isn't doing as well, we should talk to our Romanian localization team and see if there's anything there that we can optimize for. So just by the sheer amount of data that we collect were really able to drill down and optimize a lot of things on the app and it keeps us busy as a product team because there's so much stuff that we know we can improve. I have one more question on general strategy on choosing ideas and stuff, but how does like knowing the test ability of a feature influence you like decision to build it like if there ever.
I guess like you're saying, since you're looking at top lines, you can always have something to measure, right, but do you, does it you find it. I mean, my again, you can maybe just validate my assumption, my assumption is that because you're thinking, well, this actually has to produce a measure of result, does do you think it pushes your product teams to like only focus on things that are going to actually show up in the data.
You know what I mean? Yeah, it's a good question. I think it varies by team at dual and go, for example, we have product teams that work explicitly on improving how the app teaches. So this is the learning R&D area that has multiple teams building features to teach better and for them, their products, their product changes don't really impact top line metrics because. The tragedy of all education apps, right, exactly. Yeah, you got it. And oftentimes when we want to teach better, we have to make content harder and content being harder often is bad. The same in same in fitness apps to right, like you have to like maintain that, you know, that sweet spot.
Yeah, exactly. So for those teams, obviously, they haven't they're not really incentivized by a top line metric that they're optimizing for. So a lot of their work is more like stakeholder alignment with learning scientists with leadership with designers, things like that. So I think that makes it more so that they're not just looking at the data.
And then yeah, more generally, we make sure that we're not only doing things that are metrics driven, we always set OKRs for reducing tech that things like that improving product quality stuff like that. So sorry, last question.
Do you like A B test bug fixes and things like if something's just flat out broken or like maybe not even a bug fix, you're like, oh, this is just wrong. I mean, I guess it's another version of my original question of like, do you test everything? But like, yeah, say would you would you A B test a bug fix if it was like a living bug that's not like ruin, like just crashing the app or something right.
Yeah, if it's a huge bug fix, then we will like patch it and then maybe run a holdout experiment, like giving the bug back to like one percent of users just just to assess the effect. Yeah, it's crazy. But that's great. I mean, this is why it's a cultural thing, right? And this is I think. And I think as our listeners are thinking through like how they're building their apps and how they want to do product development.
Like I imagine this came from the founders right like or from very, very early and push right. This is something that the founders company felt was the way they wanted to do product and they instilled it in the culture and they made it a thing. And I think if you're listening to this and you're going like, oh God, I don't want to set up all the data and destruction all that's up.
You don't have to right there's lots of ways to do this right. But I do think you should be opinionated in how you want to set that up. If you want to be a their company that said we don't run any tests like everything is just like user comment that's fine. Like I think there's a lot of good companies that are built that way. Right. You have to think about what you who you are as a founder and as a app builder and as a and then and then apply the the system that you think is going to be you know be best fit for you to get the best results not just necessarily doing I don't know of another company.
At least in the our space that's doing quite to the extreme of that. Yeah, but it's pretty crazy. But yeah, we're very data driven or data informed as he said. Yeah, but probably no coincidence that it's one of the larger subscription apps and publicly traded and doing.
I think as well. Like I think it's one of the one of the great success stories of mobile. I would say in general like it's crazy. So I did want to move on to specific results because you built all this infrastructure and you have all this alignment internally you do all these tests but what matters is like what comes out of it.
And then I think for our listeners, you know, as we've talked about very few of them are going to be able to operate at this scale, but I think there's learnings that do Linga has been sharing again on the blog and Twitter and other places that are worth testing or for the product are to are just implementing whole hog based on do Lingua success with them.
So the first experiment I wanted to talk about was with notifications. And so you've done a ton of testing around notifications, but there is a specific tweet thread that we'll link to in the show notes. Just talking about how you test notifications and some of the top learning.
So I'd love for you to share maybe a little background on it, you know, that the tweet that I didn't go super into depth about the testing setup and stuff like that, but maybe just a little background and then let's talk through the results of that and what folks can learn.
Yeah, totally. So notifications are a big engine for growth for do a lingo so much so that notifications from do a lingo are sort of a meme. I'm sure maybe you see. Yeah, notifications from the green owl. Yeah, I see the type of green owls becoming a like like the slender band now it's like a harassing mythical character. Exactly.
是的,完全正确。通知是Do a Lingo增长的重要推动力,以至于来自Do a Lingo的通知成为了一种玩笑。我想你也可能见过。是的,来自绿色猫头鹰的通知。是的,我发现绿色猫头鹰变成了一种像斯林德班一样的神话角色,令人感到困扰。没错。
But yeah, notifications are a big driver for our retention and our user growth and it has been for many years. And we're now at a very sophisticated point where we send a lot of notifications. The main one being the daily practice reminder. So we send you a reminder to practice every day. And if you've been inactive for seven days, then we'll stop sending it to you. But within that daily practice reminder, we have call it 250 different copy templates that are eligible to be sent to a user. Wow. And we have a machine learning algorithm that optimizes the best ones to send to a user.
What we do as PMs is iterate on those 250 copy templates adding more changing existing copy templates and just testing different messaging tone. And content within those notifications to basically squeeze out more retention gains. And we've learned a lot about what works and what doesn't work with notification copy and the messaging there. So yeah, I'll go ahead and dive into a few of the learnings we've had from that.
So this is a case where these are not like this seems like another system that you built. This is not just another A B test. You guys have a separate like multi armed bandit model that like will throw these in and optimize on an individual basis.
Which my question my first question. I mean, maybe I'm going to jump ahead too far but it's like what goes into that individual choice like what vectors are like what inputs into that. You say like, oh, like we're going to try that what do we know about this person we're going to try this. That would be really interesting to hear like how do you actually like. What do you optimize on, you know, yeah, and let's dive into the AI behind it.
I mean, it's it's a few points ahead in our show notes. But yeah, it's a pretty incredible system you built. Yeah, the first thing I'll mention and this might also sound crazy, but even within this multi armed bandit algorithm, we still do A B tests.
Why not? So yeah, why not exactly. So how it works is that we have the multi armed bandit that has all these copy templates for notifications. And then as the experiment arm in an A B test, we will have that same bandit except with the additional new copy templates that were added.
I suppose like you could even do like a variation changes or like tuning changes to the algorithm and things like that. Maybe that as a whole separate thing. Yeah, exactly. So whether it's a copy template where we add more copy templates to the bandit to optimize on. So you can optimize globally as well. Right.
So you can say like adding these not just like seeing which is more performant but looking overall when we add these 10 new messages like how does that change the whole performance. Yeah, the A B test would be in control is the multi armed bandit with 250 templates. And then the experiment is the multi armed bandit with 260 adding the 10 new ones.
Exactly. And then we can measure on a local basis of when we send those 10 new ones, what that is doing to click through, but also generally what the overall impact is to top line metrics. And this is all driven by AI machine learning, right? So you have a pretty big data science team building out these models that make those decisions and analyze results.
It is AI and ML driven, but to be honest, it's simpler than it seems. I mean, basically, yeah. I'll give the you can create me maybe if I'm wrong on this, but a multi armed bandit we kind of jumped into the like the technical term. It's basically a computer or a computer science or commonator's problem or question where you have like end choices, maybe have 10 different levers you can pull.
And the rewards you don't necessarily know what you're going to get from each lever, right? And so multi armed bandit problem, there's optimizations to say like which lever should I pull when an algorithm that optimizes that is basically what you're saying in this case. And it correct me by this wrong, but instead of levers, they're actually like showing a notification that's your lever and you're trying to optimize which notification show when is is optimal.
One of the things I just interject a little ethics point when I think I love talking to you about this is that in some ways I hear all this and I think, you know, Facebook convincing me to buy some other stupid thing I don't need or candy crush convincing millions of people to buy more gems and spend more money. And there's a lot of ways these tools can be used to manipulate and drive more views of TikTok and keep you know keep me on Twitter for hours or Twitter's a bad example because they don't do anything like this.
Oh, no, they do that. They do. Twitter's like retention notifications are crazy. If you don't engage with Twitter for very long, they really start to like spam you. Maybe not maybe not in this exact case, but yeah, but I think you know, TikTok and Reels and things like that are a better example of kind of using this kind of testing and learning to to manipulate versus with doling go.
You're helping people learn. I mean, you think you're doing all the. It's so manipulate and influence. Are they any different? Right. It's really just the connotation of what you're asking for at the end, right? So the notifications thing. I think this is especially for the intersection of our audience right where yeah the app are tours. The second one you've heard term David. We're no longer we're no longer catering to indie developers, we're turning to app are tours.
你正在帮助人们学习。我的意思是,你认为自己在做所有的事情。它是如此的操纵和影响。它们有什么不同吗?没错。实际上,你在最后询问的是你所要求的内涵,对吗?因此,关于通知的事情。我认为这特别适用于我们受众的交叉点,就是应用程序的游客。你听过 David 这个术语的第二个用法。我们不再关注独立开发者,而是转向应用程序游客。
But I think there is this there's this idea that specifically talk about notifications and we talk about like optimization but they like notifications are bad. They're spammy their noise. They're ads. They're whatever. But I don't really I guess two sides to it like one like yes, Apple has gotten much better giving you controls now like you can easily turn. Do you remember I'm this is this dates me but so I don't know I'm sure your notifications might be push now but like in the early days. There was a different thing local notifications and and push notifications and push notifications you could disable an iOS but local notifications you could not.
So like everybody we were doing this elevate and I know do a link was back in like 2015 you're using local notifications because like you couldn't turn them off. Right and yeah, that's scummy I guess but like it was also cheaper to do than having push. So we had to build our own eventually Apple unified this. They made it so that notifications are all control the same place right but I guess my point being is that like you know in terms of retention I'm sure. Retention is bringing somebody back into the app right what could work better than like. Staking yourself on their home screen like when they're not using the app right is this like this got to be like your biggest one of your biggest surface areas on the retention team I have to imagine right.
Oh yeah I think the retention team at Dueling go has worked on a lot of different features of the years so we have the daily streak mechanic we have leader boards in Dueling go but then notifications is always a huge lover for driving retention and I joined the retention team as an APM but I think I was the first full time PM to focus full time on notifications but that just is like a sign of how valuable it is for our for our growth to opt for.
So we're 30 minutes is. Tell us what you've learned like what can others learn from all these tests that Dueling goes have done about creating impactful. Retentive notifications. Yes so some super basic learnings that we've had and I'm sure you'll link the Twitter thread and the show notes but having a very distinct tone and CTA in your notifications is really important especially when you're sending multiple notifications perhaps over even the same day so.
Dueling go we have a few different notifications that we might send one of them is the practice reminder which I mentioned we have a really clear CTA of just hey it's time for your daily practice get back into the app to study Spanish for five minutes something like that and then. For users who are on a daily streak we send another notification if they haven't practiced at the end of the day like 10 PM which we call the streak saber notification. And it has like a more alarming tone that says last reminder or hey it's getting late things like that when you're sending multiple notifications it's really important to have a distinct purpose for each one just so from the user perspective it's more like. Duo is having a conversation with me he told me that I have to practice at 4 PM but I ignored him but now at 10 PM he's getting kind of scared because I might lose my streak so having that sort of narrative that you can talk to the user with with your notifications. It's really important so yeah you mentioned you mentioned tone there so like.
Paragraph 1:
What's an example of like I guess like there's alarmist versus not but you actually in the notification speak from the voice of the bird right duo which I just learned his name do you. Like who when you guys are writing copy for that like are you trying to be cute with it or is it like Friday basic.
Paragraph 2:
Yeah we try to vary the tone because we've learned that like variations in tone is good as a user is getting multiple notifications over several days so some of our notifications are really cute and encouraging and positive like amazing job you're on a 10 day streak come back and do Spanish other times the duo might be more passive aggressive or. Yeah like you can practice Spanish if you want so we try to have that very easy so really funny with it like you really like term and me met up yeah I mean yeah make sense to.
Paragraph 3:
Are you writing those personally you guys have copy writers are yes so a lot of it so far has been done by PM's like myself but we also have content designers that help us review and write some copy but yeah for a while my job was just writing from the voice of the bird and. You're ready to use for do
Paragraph 4:
I mean I think it's worth pointing out a meta point here on I think marketing and something I believe in just in general is that like a opinionated voice that sounds distinct even if it's bad like even if it's annoying to some people is always I think going to perform better then like a bleached and very like. That's not the word sanitize just like a very like sterile a very sterile voice right that you're trying to appeal to everybody I don't think I personally and I think that sounds like with you as a data bears this out does not outperform right you'd rather have hits and in peaks and like a little bit like odd because again like even on you know somebody I I you know curate my notifications very heavily right and I still get a lot of them.
Paragraph 5:
And so you know one of them makes me smile like I can imagine that's going to that's got to increase the you know likely I'm going to click in yeah exactly especially since we're sending these notifications up to a seven day schedule maybe the first three the user will just ignore but if there's a really wacky or fun or creative one they get on day four maybe that will bring them back so that's why we try to have variation and we obviously have the algorithm also optimizing which ones are most effective over there. Over that schedule
Paragraph 6:
Next up the next learning was keep it simple what what is what did that mean yeah basically what it sounds like essentially just keeping the messaging or whatever theme of the notification very simple this is especially true with newer users are users that aren't as engaged with the app. People for do a lot of work on maybe shorter streaks keeping the copy really short and concise this was especially true I think I forget whether it was iOS 14 or 15 where Apple started cutting off chopping off more text in the notification header so a lot of our existing notifications starting anymore yeah yeah they started getting chopped off near like the fourth or fifth word so then we we've been doing a lot of tests that just have notifications that three words like got three minutes study Spanish now and though things like that worked really well.
接下来,我们所学的是要保持简单,什么是保持简单?这是什么意思呢?基本上就是像它听起来那样,即使通知的消息或主题非常简单。这对于新用户或那些不太参与应用的用户尤为重要。由于有些人可能只会在较短的时间段内完成许多工作,因此将内容保持简短明了。这在iOS 14或15中尤其重要,因为苹果开始在通知标题中更多地裁剪文本。因此,我们的许多现有通知在第四或第五个单词附近就开始被裁剪了。因此,我们进行了很多测试,只把通知简化为三个单词,如“Got three minutes study Spanish now”等等,这些方法非常有效。
Paragraph 7:
Because people as you said people get a lot of notifications from a lot of different apps so if you can like quick grab the users attention with like two or three words that has often reduced this cognitive load I mean it's just a product management general like a good you know anything that makes your users have to think hard is is probably going to have adverse effects to their affinity for the product or willingness to engage because you're just tiring them out you know what is interesting is for our more engaged learners for example if you're on a 50 day streak more complicated messaging can often work well to so within the notifications will reference specific mechanics like on dual and go you can use a streak freeze to save your streak if you don't practice for a day so in some of our notifications will say hey you're out of streak freezes don't lose your 50 days streak and obviously that messaging is a little bit more complicated than something we send to less engaged users but we do have this like variation in messaging that we send for less or more engaged users and so I would understand that that's an input to the algorithm as well as like what if there are 50 return or like first return or something like that that will help you like you'll you'll choose different copy or different notification based on that.
正如你所说,人们会从许多不同的应用程序获得大量通知,因此如果你能用两三个词来快速吸引用户的注意力,这通常能减轻这种认知负荷。这只是一个产品管理的基本原则,任何让你的用户需要思考的东西都可能对他们的产品亲和力或参与意愿产生不良影响,因为这会使他们非常疲倦。有趣的是,对于我们更加投入的学习者,例如如果你坚持了50天的连续学习,更复杂的信息通知通常也会有好的效果。因此,在通知中我们会引用特定的机制,例如在Dual and Go上,你可以使用“冻结连续天数”功能来保存你的连续学习天数,如果你一天没练习的话。因此,在一些通知中我们会说:“嘿,你的‘冻结连续天数’用完了,不要失去你的50天连续学习天数”,显然这种信息传递比我们发送给不太投入的用户的信息复杂一些,但我们确实有发送给不同投入程度的用户不同的信息。因此,我认为这也是算法的一个输入,例如如果有50个用户返回或者首次返回,你可以选择基于它们选择不同的文本或不同的通知。
Paragraph 8:
Yeah so it's actually not as complicated we don't have as many parameters within the algorithm itself but what we do is we set specific criteria on those copy templates so for something like okay so this is a certain groups can't get our notification. Exactly.
Paragraph 1:
So the band it will just optimize for the eligible templates that a given user is eligible for understand got it that's so cool and so the next learning was keep titles under 30 characters which I think you can already explain it and they keep the simple thing but the next ones are interesting emojis work best in the title not the body that just seems so goofy by the data proved it out like how do you all use emoji and name what were you doing in the body that just didn't work.
Paragraph 2:
Yeah so emojis are another like vector for how we optimize notifications people generally respond well to different emojis what we have noticed is that generally emojis work better in the title of the notification rather than the body my hypothesis is that in the title emojis work really well as an attention grabber whereas if you have an emoji in the body it makes the notification look a little bit noisy which at that point is tuning it out. So yeah we've tested with this and it seems like it works better if you have the emoji in the title and also the emoji as the leading character rather than at the end of the icon right you're having like a customized like aside from your app icon which gets shared in the notification like you can add even like a topical extra icon that like kind of draws in I'm the same way like when I went when I read text with like emoji embedded within I boomer brain like I just can't like like I this too much like I can't I can't parse this right becomes very noisy so that that that's intuitive see this is thing I had all this vision to begin with we didn't need to run all these tests I could have told you just to be getting after once I've seen the data then I'm certain that I was right.
Paragraph 3:
On the subject of notifications before we move on this wasn't actually shared in the tweet storm or tweet thread that that will link to but as a failed experiment you talked about how using a marketing tone and notifications just doesn't work to tell me the thinking behind that and and what experience you've experienced you've run that just fell flat. Yeah it's a it's a really interesting experiment that we ran recently basically we want learners to tap on our notifications because when you tap directly on a notification it links you directly into a lesson and that increases the completion rate of doing a lesson. We have this incentive to get users to tap directly on the notification rather than just navigating to to the app on their own so we wanted to test copy templates that explicitly instructed users to do that things that said tap here to start a quick lesson as it so happens that sort of tone doesn't work really well.
Paragraph 4:
In our analysis we sort of were thinking that this is because it sounds more as like an ad or like you're getting a discount from sort of app duo is like tap here you be like what you doing man yeah it sounds more like a brand friends did that right. It's going to have here to respond to my text yeah it just doesn't work. Again it ties back to the idea that like an organic tone of having a character talk to you works better than having like a brand talk to you from a marketing standpoint. Yeah that's just a great lesson I think in in app marketing sometimes it is we feel like we need to be grown up marketers and like adopts all the like marketing best practices and you know. It doesn't really work in a lot of these cases and I think it's a really important thing to keep in mind.
Paragraph 5:
The next thing I want to talk about I think this is something you actually blogged on was streak flexibility. So yeah tell me tell me about those experiments and then and what you'll learn. Yeah so as I mentioned earlier on Dueling go you have the daily streak so every day you practice your streak goes up and that was a huge mechanic for our attention in the early days just because it gives you the reason to come back every day it's it's and then every other social app copied it like in the world right exactly yeah it is the best retention mechanic you can think of what we learned over time is that this sort of streak mechanic is very punitive in the sense that.
Paragraph 6:
If you miss one day by a chance then you lose your streak and that is extremely demotivating there's a huge churn point where if you lose the streak even if you're very committed beforehand just the experience of losing a streak is super discouraging and uses will just fall off the platform so we wanted to add more flexibility with how you can go about your streak so we have this in app item that you can purchase or we also give it out sometimes as a reward for doing certain things.
Paragraph 7:
Which is the streak freeze basically you can miss a day and your streak will stay the same you can come back the next day and continue learning and increasing your streak so adding the streak freeze was amazing for attention just giving us a little bit of slack just makes the process of keeping a streak a lot easier and it it's less punitive and punishing for a user to not have to come back every single day and then we've done so many optimizations around the streak freeze.
So you can buy streak freezes with our in app currency called gems or as I mentioned you can get them as a reward for completing your daily quest for example we've optimized the drop rates of how often they drop as a reward we've also increased the number of streak freezes you can have at a certain point so now you can equip up to two at a time essentially you can miss two days and still keep your streak.
Obviously like as we're doing all these optimizations we're trying to keep in mind what we call the sanctity of the streak to make sure it's still make it meaningless right exactly and generally as we've iterated we've seen retention gains and people are still ascribing value to the streak even though there's a chance that you missed a few days in between our all the mechanics only are proactive so like if I if I just miss a streak day and I'm like I missed it can I recover it or is it is a do I still have a chance to do it.
Is it is a do I still have to like put this freeze in place before. Yeah so you have to have the streak freeze equipped we have a separate mechanic also called the streak repair and this is like example on dropping $1.99 on that yeah so that has historically been like an in app purchase or it's a it's a perk that you get in our subscription super where you get a streak repair where in the case that you used all your streak freezes miss today and actually lost your streak you can still get it back.
But over time we're also trying to make sure that all these features are aligned and not yet doesn't get out of control I mean but it does feel like. I mean as long as there's some activity right like as long as users are thinking about their streak it's kind of what you want to like okay do I care. But then you would go back I guess the sanctity thing it's like at some point if you over optimize for streaks it's own a streak is only important in so much as people are doing language learning right if they're not doing language learning then the streak is just an addiction mechanic and exactly you're not you're not making the world a better place anymore right which is a constant risk of being a product mixture right optimizing create something terrible for the world right.
So it's an interesting balance but the streak the streak as I've always wondered about that like the how do the how do the downsides of a failure to maintain that thing. A fact and I'm I'm sure like it doesn't become a dominating effect right like that the people aren't losing streak so often that it it makes a difference but I'm sure at this stage like when you're optimizing everything you can.
Like there's more to be you know had there and then also just like you know from a branding perspective makes the conversation like do is a little more chill you know what I mean like just going to give you a break right like letting that weekend there's something I gotta go back like I'm trying to be all over me. I think it's great and actually I've thought a lot about apples fitness streaks and there's times when you're sick where you can't work out or the standard locations where you're a flight.
Oh my gosh get out of here and it's like you might actually do more harm than good. I turn it all this off I like when I got my apple watch I was all about it and then I was for a while it's like come on like you're not you're not adapting to me right like I'm done with this.
So while it's breaking the rules it's like breaking the rules in it in a healthy way I mean I think it's pretty cool we actually talk to the former CMO of Tinder. Oh yeah and Tinder has a few things like that we talked about how you know some of their monetization is around breaking the rules and it's interesting how you can break the rules in good ways and you can break the rules in bad ways and the balance is is finding ways to break the rules in ways that are actually positive and beneficial and that's what the users will respond to not just by your way out of something it's like okay it was like you know right yeah yeah Yeah, there's no Pitch that one he's your guys to test that one.
Just tell me about some ideas. Yeah, I'm so glad guys they gave me some great ideas I'm honestly. Oh man.
只要分享给我一些想法就行了。是的,你们给了我一些真棒的思路,我感到非常高兴。天哪。
Okay, we do need to get to wrapping up but there was one more question I had was the widget so widgets have been huge the last few years and tapal you know redesigned the whole home screen.
What kind of results have y'all seen from having the the street widget and and I imagine you keep stats on you know the number of users who actually have it on their home screen the number of people who tap on it has it been a pretty big driver of retention and kind of visibility for the app.
Yeah, it's a great question so on iOS we have a streak widget it's basically a home screen widget that shows the current length of your streak and it has a cute picture of duo over the course of the day as you don't practice duo gets like angry and angry or like more worried so it's kind of like almost like a time got you like thing that's living on your home screen to get you to practice that's so cool.
Yes, so we launched it this year and in fact it exceeded our expectations that how good of a retention lever it is and there's two reasons for one is that for users that are active that are honest streak it is like a constant notification that lives on your home screen reminding you about your streak but it's also cuter than a notification it's more adorable and it's something that you opted into to adding to your home screen.
So it feels a lot less spammy than a real notification but it's a constant reminder like hey this is your streak do you actually do some like educational content in there like ombre is man does it have any kind of like actually reinforce learning or is it mostly about the streak that is on a road map we don't have that yet but there's a lot of like optimizations and like.
I think just get open the app you could just turn up a number said your streak is three right like a very phone didn't widget and like I've built these features for Apple before and if I've learned like if you don't make it on its own something that provides value it's nobody's going to be really excited about it but I give you're putting like some character into it you're like making it something and a little bit of an experience in itself and like a vocation emotion just with it then you have a chance I guess my question when we're going to be able to do that.
I guess my question would be like doesn't move the needle for you all unless you get people to actually add it right like are you trying to like actively push that are you just saying it's successful for the people that already have it installed.
Yeah, that's a great question so there's basically two things that we have to work on with the widget one is getting users to install it so we have like in app promotions when you finish this a session or a lesson will say hey we have a widget you should install it and when we tested showing that card it actually like let us surprisingly high number of users to install the widget so our widget DAU graph has kind of been going hockey stick recently which has been awesome but yeah after we get the user to install it there's also the optimizations.
What the widget actually looks like and both of those levers getting you just to install and making it valuable are both levers for retention for us yeah it's interesting like on a product that's a decade plus now I'm not sure if something like that right that there's still still more to be.
And like you said like you know it's still still growing 50% DAU growth or whatever it was over the last year is pretty crazy right I would have never have in 2013 when the app store was nice and there was this app do a lingo which we're like well they're giving it away free like we didn't really understand like I would never have guessed they would have gotten to the point today but then again I didn't know I didn't know what you guys were doing and now it kind of makes sense so it's pretty amazing.
就像你所说的那样,它仍在继续快速增长,去年增长了50%的DAU,这真是令人惊讶。在2013年应用商店还很友好,还有一款应用叫做do a lingo,我们会说:他们免费提供这个应用,但我们并不真正理解它,我从未想过他们会发展到今天这个地步。但另一方面,我不知道你们在做什么,现在我明白了,所以这真是太神奇了。
Yeah so I think it's a great place to wrap up such an incredible app what an incredible journey to from from early in the app store to publicly curated company and it's so cool all the different ways that they do all incorporate testing and data to to help influence that and to continue to grow and continue to make it a great experience.
Anything you want to share in closing I know doing is still hiring like crazy you know the rest of tech there's a lot of people looking for jobs so anything you want to share there yeah now we're very fortunate that we're still in good standing and we're still hiring so look up our careers page if there's any positions that interest you and yeah download do a lingo if you haven't we love we love our users that we love hacking yeah always be real hacking exactly.
在结束时,如果你有什么想分享的,我知道我们还在不断招聘,科技界其他地方也有很多人在找工作,所以如果你想分享什么,欢迎。现在我们很幸运,我们仍然处于良好状态,仍在招聘,所以如果有任何你感兴趣的职位,请查看我们的职业页面,并下载 Do a lingo,如果你还没有的话。我们爱我们的用户,我们热爱黑客精神,永远保持真实的黑客精神。
Yeah all right thanks so much this is a fascinating conversation and again I appreciate so much that how open do lingo is to share this stuff so again we'll share links in the blog in the show notes for for a lot of really great blog posts and tweet threads and things like that so thanks again thank you so much David and Jacob.
Thanks so much for listening if you have a minute please leave a review in your favorite podcast player you can also stop by chat dot sub club dot com to join our private community.
非常感谢您的聆听。如果您有一分钟的时间,请在您喜欢的播客播放器上留下评论。您也可以访问 chat dot sub club dot com,加入我们的私人社区。