Suppose a friend told you that he was planning on doing a TED Talk, and he asked your advice on how to make his talk one of the most popular TED Talks out there. What would you tell him?
This is exactly the type of question Data Scientists seek to answer. The way Data Scientists approach such a problem is to gather information on past TED Talks and analyze that information to see which factors describe only the most popular TED Talks, and not also the less popular Talks. For our purposes, we’ll define “popular” TED Talks as Talks that generate a lot of views.
So then following the Data Scientists’ route, we obtain a database that contains all TED Talks posted on the TED website from its inception in June 2006 through September 2017. There are 2,550 talks. The distribution of views per talk across all the different talks is presented in Figure 1.
What the distribution of TED Talks by views shows is that (i) A small number of talks has received over 5 million views each, (ii) a large number of talks has received several million views each, and (iii) and a large number of talks has received less than a million views. We can present the same information contained in Figure 1 a little differently, as shown in Figure 2.
Figure 2 shows that the 4% of talks that each had more than 5 millions views collectively accounted for 25% of the total number of views for all TED Talks from June 2006 through September 2017. That is, a small number of talks generated a large portion of the total views for all TED Talks. What we want to know is: what characteristics do those top 4% of talks share that the other talks don’t?
The database of information on TED Talks contains information on characteristics of speakers and their talks, including, for example: date posted; duration of talk; title and description of talk; identity and occupation of speakers; number of languages for which transcripts of the talk were provided; number of comments by other viewers; and tags — or themes — for each talk. There is also a ranking system TED provides for viewers to rate the talks. Viewers are given a set of 14 descriptors from which they can choose up to 3 to describe a particular talk: Beautiful, Confusing, Courageous, Fascinating, Funny, Informative, Ingenious, Inspiring, Jaw-dropping, Longwinded, Obnoxious, OK, Persuasive, or Unconvincing.
The tags for each talk appear to be inconsistently defined. For the 2,550 TED Talks, there are 19,098 total tags, of which 417 are unique. The 5 most popular tags are: Technology, Science, Global Issues, Culture, TEDx, and Design, which collectively account for 16% of the total tags assigned. The tags assigned per talk vary from 1 to 32 in a seemingly unsystematic manner (see Figure 3). This inconsistency in assignment of tags suggests that the tags variable would not be a good predictor of TED Talk popularity. The analyses were run both with and without information on tags, and, as suspected, they didn’t provide any additional information.
Jumping Right In!
So, now, if we jump right to the analysis, what does it tell us? If we regress the number of views a TED Talk receives on the various data elements in the dataset, across all 2,550 TED Talks, we get the results presented in Figure 4. To be conservative, I’m labeling as “statistically significant” those results that are significant at the 1% level (i.e., p-value ≤ 0.01). Variables with statistically significant coefficients have been highlighted in yellow.
The first observation from the analysis is that the adjusted R2 is 0.82. There’s a good amount of variation in the views a Talk generates – 18% – that isn’t captured in the variables that have been included in the regression.
The second observation is that the impact of languages is large and positive. So, talks posted in more languages generate more views. Or talks that generate more views are posted in more languages. This is correlation, not necessarily causation.
The third observation is that the year the talk was presented has by far the largest impact on the number of views a talk has received, where talks given in later years are more popular. We’ll explore this more in a minute, but first, let’s go through the other variables in the regression.
The fourth observation is that talks with more ratings generate fewer views. Before we interpret this unintuitive result, let’s consider the impacts of the individual ratings descriptors. It turns out that the ratings descriptors that generate the most views are Confusing and OK, not particularly favorable descriptors. The way I interpret the information on ratings is that it’s the less popular talks that viewers give ratings to, and those ratings are not favorable. So ratings reflect people voicing dissatisfaction with the talk, and people who enjoy talks simply don’t provide ratings.
So now let’s return to the strong relationship between Year and the number of views a TED Talk receives. Consider the pattern in Views per Talk over time, presented in Figure 5.
Talks during the first year received a lot of Views, but there were relatively few talks that year, so those large Views per Talk get less weight in the analysis. Views per Talk peaked in 2013, but they were also relatively high for 2014 and 2015. Also, there were enough talks presented during those years to give the large Views per Talks large weight in the analysis. So it looks like the large positive impact of Year on Views reflects the fact that talks in 2013 through 2015 — later years in the analysis — generated more views. Again this is correlation not causation.
Let’s take one more deeper dive and compare the distributions over time of Talks with less than 5 million views and Talks with more than 5 million views. That is, we’re splitting the blue line in Figure 5 into two sub-components. The distribution of Talks over time for Talks with more than 5 million views and Talks with less than 5 million views is presented in Figure 6.
It turns out that of the 99 talks in the dataset with more than 5 million views, 22 of them were presented in 2013. So what the large positive coefficient in the regression on Year is saying is that talks that were presented in later years, particularly 2013, generated more views. Again, this is correlation, not causation. It doesn’t say if you want to generate more views, then present your talk in 2013. Rather, it says that talks that generated more views took place in 2013. Correlation, not causation.
So now recall the distribution of Views per Talk in Figure 1. The distribution is nonlinear for talks with more than 5 million views. So then what happens if we look at the analysis of talk characteristics that affect Views separately for the two subgroups? That is, what happens if we subdivide the talks into those with less than 5 million views and those with more than 5 million views, and then we run the analysis separately for each subgroup? Are there differences in the patterns of characteristics that predict numbers of views for the two different groups of talks?
Talks with Less Than 5 Million Views
Let’s first take a look at what the analysis says for talks with fewer than 5 million views, which is presented in Figure 7
The results of the analysis for talks with fewer than 5 million views shows the identical pattern as that for all talks combined. This suggests that that weird pattern we saw for the Ratings variables in the analysis of all talks — where people tended to submit more ratings for talks they don’t like — pertained to the less popular talks, that is, talks that had less than 5 million views.
Talks with More Than 5 Million Views
So what do the results have to say about the talks with more than 5 million views?
As Figure 8 shows, for the most popular talks, none of the characteristics of the talks are significant predictors of views. In other words, if you ask, “what are the characteristics of the most popular TED talks?” The answer is, “there is no predictor.”
So What’s Going On?
Here’s my hypothesis.
TED Talks can be viewed on TED’s website, but they can also be viewed on YouTube and other social media sites, such as Facebook, iTunes, and Hulu. Which talks are people most inclined to view? Do they go to the TED website and start with the most recently presented TED Talks? I don’t think so. I posit that most TED Talks are viewed through either (i) a link sent to people by friends, (ii) a link others posted on social media, or (iii) talks posted under a label of “Top 10 TED Talks,” “Most watched TED Talks,” or some other such label.
In other words, I posit that the most popular TED Talks are the ones that have been caught up in a success-breeds-success loop, which has been facilitated or fostered by choice architecture, so as to propel those Talks into the group of most popular.
Success-breeds-success phenomena occur when things that are popular become even more popular, because they are given more chances to succeed. For example, once a piece of content has garnered enough clicks, other people will click on it simply because many others have also done so.
Wikipedia defines choice architecture as the design of different ways choices can be presented to consumers, and the Impact of that presentation on consumer decision-making. In other words, choice architecture recognizes that the way you present choices to people can affect which of the options they choose.
Choice architecture feeds success-breeds-success phenomena by labeling certain content as “Top 10,” “Most Viewed,” “Now Trending,” etc. People will tend to skip individual pieces of content posted on the site in favor of what’s most popular. Either they view what’s popular as a proxy for high quality content, or they fear missing out (FOMO) on what so many others have experienced.
So, I posit that the most popular TED Talks are not viewed through a visit to TED’s website. Rather, I propose that the most popular TED Talks are more likely to be viewed because they either serendipitously end up in the path of viewers or they appear under a label of “Most Popular.” A TED Talk becomes among the most popular when it starts to gain momentum in views, gets passed around more on social media, makes it into a Top 20 list and continues to become ever more popular because it’s popular.
The other contributing factor that might make the most popular TED Talks so popular is that they exhibit some intangible quality about the speaker or the talk that appeals to viewers, but that hasn’t been captured in the database of TED Talks data. The 14 ratings descriptors capture some elements of this, such as Ingenious or Inspiring or Funny. But they don’t capture information, for example, about speakers who are dynamic, or wry, or captivating. It’s also possible that intangible characteristics lead the most popular TED Talks to gain the initial momentum they need to get caught up in a success-breeds-success loop, which then propels them into the top Talks.
So What Does This Mean?
The first implication is that the key information we need to answer the questions we seek to answer is often not captured in the data we have. Sure, we might be able to get some small scraps of understanding from the information we have. But relative to the primary understanding we actually seek, the scraps are often irrelevant. However, we won’t understand what we’re missing, unless we have some understanding of the dynamics that drive the situation. In other words, if we jump right into the TED Talk data without first thinking about what might drive popularity of TED Talks, then we’re very likely to completely miss the big picture. We won’t know what we’re missing, unless we take time before jumping into the data to try to understand what really drives the situation.
The second implication is this. In a world flooded with information, in which everyone is vying for our attention, success-breeds-success phenomena and choice architecture are increasingly determining which content ends up becoming popular or successful. That is, a product’s success is increasingly determined as much by factors that don’t have anything to do with the nature or quality of the product itself, but rather, by how well the product is propelled into success through extrinsic factors. Merit won’t necessarily win the day. Is that what we want?