Executive Summary:
During the height of the COVID-19 pandemic media industries shifted their business models to be more direct to consumer (DTC) or streaming model orientated to capitalize on changing consumer habits. This problem has led to the ongoing “Streaming Wars” between all the major players in the media landscape. Although one company may try to outspend another to win, what good use is this if they are not aware of their consumer habits and the relationship behind why and how much time they spend streaming? We set out to create a model that would be able to predict a consumers average weekly streaming hours based upon their Gender, Race, Age, Household Income, Streaming Platform, Number of Streaming Devices, Favorite Streaming Genre, Most Streamed Day, Most Streamed Time, and Number of People in the Household. Surveys were distributed through friends, family, office coworkers, classmates, and online TV streaming enthusiast groups. Given the responses, we are able to create a final significant model based upon using Age, Household Income, and total Number of people in the Household. The final model surprised us with a big coefficient of 11.74 for the numbers of streaming devices in a household relative to the other two coefficients which were below 1. We believe this model can be extremely beneficial in helping the media companies to understand the hours of consumers behind the numbers and will help guide them to win the streaming war in terms of where to allocate resources for advertising, content spending, and internal promotional efforts.
https://docs.google.com/spreadsheets/d/1tvlvSJkIJvVeOwz7CSOzHcUDwa9b1LMQ9J3QUaPs63Q/edit?usp=sharing
Introduction
For the project, I chose to predict the average amount of time spent streaming per week amongst Americans in the foreseeable future. We were eager to cover this topic to see how increasing inflation affecting Americans pockets, increased competition in the marketplace, digital companies raising prices while downsizing their catalogs, the recent Hollywood Writers Strike, the fascinating inverse effect of shareholder earnings increasing despite unpopular policies being pushed on the consumer (such as password crackdowns multiple price hikes, and digital content removal) would affect the consumer. With this in mind, we are trying to predict the future amount of time spent streaming per week for the average citizen in the United States.
We chose certain independent variables such as age, household income, the day’s most likely to stream, and number of devices streamed because I believed that these factors closely correlate with the amount of time spent streaming. For example, with age, a college student or retiree will have more hours available to stream than a working professional working a 9-5 with a family. A college student may also be able to take advantage of certain student discounts by subscribing to a bundle, such as having a bundle of Hulu, Spotify, and Showtime for a monthly low price, or a family subscribing to Disney’s Bundle (Hulu, Disney Plus, and ESPN). Another variable that I chose was household income. I believed that the more discretionary income available, the more subscription platforms could be used, including devices used to stream. For example, a household with a lower income won’t be able to afford subscribing to as many platforms or have as many devices in the household available to watch content, whereas a higher household income would correlate to more devices being used, more platforms subscribed, and more time to stream. I chose the day’s most likely to stream as an independent variable because I was curious to see how many people do most of their steaming on the weekends vs weekdays. Since most Americans are typically off on weekends, I thought that this would be the time that most people are able to stream the most, since they have the freest time available to do so as compared to weekdays. Although these aren’t the only independent variables we chose, I thought that one of the most important would be the number of devices used. Our logic was that the more devices that a household has, the more convenient it would be to stream. For example, if someone only has a TV to stream, they would only be able to access that at home, compared to someone with both a tv and a phone. I collected our data by sending out a survey and obtaining 267 responses from a variety of different demographics to get a more accurate and diverse picture of American streaming habits.
Analysis:
- Sex (Categorical-Male or Female)
- Description: Males and Females can have different habits of streaming. The males are typically more akin to watching sports or action and adventure where female habits are more to align with social behaviors about keeping up with newer reality or romance show episodes which can produce more “binge-able” hours of streaming. (This variable in the model was converted to a dummy code)
- Analysis: Sex was the lowest positive correlative variable that relates to steaming hours. We managed to get as close to a 50/50 in observations with Males standing at 55% and Females at 45%.
- Race (Categorical-Caucasian, African American, Asian, Hispanic)
- Description: Race can play varying factors of streaming based upon cultural aspects. Moreso there may be a cultural phenomenon where a certain race might dedicate time to watch shows based on accurate representations of themselves. (This variable in the model was converted to percentages)
- Analysis: Out of all the observations it appeared we are overweight accounting for African American at 39% and underweight surveying Caucasians at 17%. Race appears to be negatively correlated to the number of hours streamed.
- Age (# of years)
- Description: A person’s age can play a multitude of factors in relation to what they will be watching on streaming services and for how long. Younger audiences tend to rack up viewing hours more frequently rather than older people who might have more responsibilities. Also, retirees and college students will have more time to stream compared to working professionals with families.
- Analysis: Our age responses vary significantly with one outlier standing at 71 years old. But overall our Age tends to skew a bit younger than the average American which is about 32 vs the average at 38.5. This could be tied into the relationships of people in which we gave the surveys out to. Majority of our responses are coming from MBA student age range around 25-30 age range. There is no trend in the scatter plot as the data is very loose.
- Household Income (in $)
- Description: Income can factor into viewing hours, we believe by increasing income the household has effectively generated more time to view streaming content. Income can also be situational in a sense that it depends on individuals in households but also keeping in mind that they need to pay for streaming services.
- Analysis: The average household income that we collected is around $135,000, this is higher than the average American income. Since this is a survey it could be likely that people could be inflating their income when reporting anonymously. Other factors can come into play like the growing number of people living with their parents or other people more than ever during the current housing crisis. From our hypothesis we expected this variable to have some correlation towards viewing hours. Out of the correlation analysis and scatter plot we see this is the most correlated to the amount of hours streamed per a week
- Streaming Platform (Categorical-Netflix, AppleTV+, Hulu, Max, Disney+, Amazon, Peacock)
- Description: We are interested in seeing how consumers view streaming content through different platforms. The bigger and more varying the content available on the platform can lead to higher hours streamed. (This variable in the model was converted to percentages)
- Analysis: Looking from the correlation breakdown, it’s surprising to see that what streaming platform you decide to use can be slightly correlated to a person’s race at .116. We suspect that this is because some streaming platforms are catered to specific audiences but also negatively correlated on income which can vary from race to race.
- Devices Used to Stream (# of devices)
- Description: The more devices that are available per household, the more time can be spent streaming content in varying situations. For example their “main” viewing device can be their standard TV at home but while on the go or at a doctors office they can utilize other devices and watch more hours of content.
- Analysis: We see the strongest correlation with household income at .38 which makes sense as people will have more money to purchase more devices to watch on different devices but relatively low around .12 when compared to hours streamed. The average device used in a household is reportedly around 3. This is common in this day in age where people usually have apps on their phone, TV, or tablet.
- Genre (Categorical- Action/Adventure, Comedy, Documentaries, Romance, Drama, Children
- Description: This variable will break down how each genre treats their streaming time. We suspect viewing time to be low when viewing documentaries or dramas but high when watching Action/Adventure or Comedy since the purpose and design of the audience of those genres might tend to have more time and watch similar titles rather than just viewing one piece of content. (This variable in the model was converted to percentages)
- Analysis: Genre turns out to be negatively correlated to hours streamed per week at -.029, meaning that viewing hours can vary across the board and is not necessarily tied to a person’s favorite genre.
- What Day Do You Stream The Most? (Categorical-A Weekday, Saturday, Sunday)
- Description: The average person should be able to consume more content on a weekend day rather than a weekday since there are more hours for the typical 9-5 schedule, however there are some instances where certain shows “premiere” on weekdays where people might be able to view on a workday. (This variable in the model was converted to percentages)
- Analysis: While a person’s choice of most streamed day is not correlative to streaming hours, it is strongest to another one of our variables, Genre at .145. This is interesting to us as it can symbolize that depending on the day people might be watching different genres or that maybe a premiere on a streaming network is forcing people to have certain viewing habits.
- What Time do You Stream The Most? (Categorical-Mornings, Afternoons, Evenings, Nights)
- Description: A person’s viewing habits might change throughout the day depending on what time they view the most. It is suspected that if someone watches content in the Morning or Afternoon they will have less viewing time than someone who watches in the Evenings or Nights since they might not have anything else to do in the day. (This variable in the model was converted to percentages, either before 5p or after 5p)
- Analysis: Time was lowly and negatively correlative in respect to hours streamed per week at -.02. This seemed adequate across the board for the rest of the X variables. This is telling us that since the pandemic there really is no set place in time that people mainly watch streaming content. This can be especially proven given the fact with the advancing technology and different nature of jobs, streaming happens around the clock.
- Number of People in Household (# of people)
- Description: The more people associated in the household should skew towards more viewing time since there will be more opportunity to co-watch content with other house members.
- Analysis: Number of people in the household seems to still be lowly correlative at .03 compared to hours streaming but was strongest when compared to number of devices used at .134 which symbolizes the possible theory of device sharing in households which in turn should produce more opportunity for increased viewing hours. Statistically the average responder to our survey had just about 2 people per household which makes sense for people living together with their spouse or significant others.
While performing the analysis I examined that there were no multicollinearity issues.
The first regression analysis that we performed on our full model produced a R-squared value of .69, which means that our model is explaining about 69% variability of Y (hours streamed a week) given the totality of our X variables. In terms of Significance F, our model produced an output of 1.07E-60, which is less than 5%, deeming it as a “good” model. (We removed intercept as if left it in we would not achieve any respectable P-values throughout all of our models) Investigating the P values we see that the only variable with a good P-value (the statistical measure that the likelihood of the observation was given by chance) turned out to be Household Income at .04 or 4%. This is below the 5% threshold and deemed that it is significant right away. The variable with the highest P-value turned out to be Race (by percentage) with the value at .93 or 93%. The rest of the variables all had P-values higher than 5%.
We then correspondingly dropped the Race variable from our model and ran it again. There was no real change in our R-squared value nor for our Significance F as they stay relevant and in good condition. The next worst P-value in our model turned out to be Genre (by percentage) sitting at a .79 or 79%. We then dropped Genre from our next regression and ran it again. We saw miniscule improvements from our R-squared and Significance F numbers. The next variable that was deemed to be terrible was the Sex variable at .70 or 70% so we proceeded to dump it and run it again. In regression 4 we still had similar small effects to Significance F and R-squared. We saw our next worst variable was Platform (by percentage) sitting at 58%, we can see these horrible P-values getting down so we dumped it and ran it again. In regression 5 we had likewise effects upon our Significance F and R-squared but saw an improvement in the worst P-value as it was day of streaming at 26%. In regression 6 we started to finally nail down some final variables that would be deemed good P-values. We had a similar R-squared as before, but our Significance F has improved greatly now at 3.02E-65 and our lowest P-value was Hour during streaming (in percentage) at 13%. In regression 7 we saw our Significance F go to 7.14E-66 and our last lowest variable that needed to be removed with # of Streaming Devices used at 9.5%.
In our final regression we saw the remaining 3 variables emerge: Age, Household Income, Number of People in Household all below 5%, with .0004, .01, .001, respectively. Our final R-squared turned out to be .68 or 68% and our Significance F is 1.8778E-66. Overall, this model is to be deemed good.
Using our final model, our equation is the following: Number of Hours Streamed per Week (Y) = .239(Age in years) + .02(Household income in thousands $) + 11.74(Number of People in Household)
- Age in years (.239) Coefficient – A single measured unit in an increase in a person’s age (years), results in a .239 increase in hours per week of streamed content.
- Household Income (.02) Coefficient – A single measured unit of increase in a person’s household income by $1,000 translates to .02 more streaming hours per week.
- Number of People in Household (11.74) Coefficient – A single measured unit increase in the total of people in a household will translate to an increase in 11.74 hours of streamed content per person per week.
- This is an odd occurrence visually but since we utilized percentages it will be less than 11 hours per household. (either .66 for more than 1 or .34 for 1)
Prediction Testing:
Number of Hours Streamed per Week (Y) = .239() + .02() + 11.74()
Raw data row used: Observation 18 – 45yrs old, $125000 income, 2 people in household.
Number of Hours Streamed per Week (Y) = .239(45) + .02(125) + 11.74(2)
…Number of Hours Streamed per Week (Y) = 10.755 + 2.5 + 7.7484
……36.0735 predicted hours of streamed content per week vs 35 hours observed
The predictive model is about 3% from the actual which can be deemed very good and significant to numerous studies.
To conclude our study on predicting weekly streaming hours of people in the United States, it has plenty of real-world managerial applications. Personally, in the media industry this is very important in today’s market as media companies like Disney or Warner Brothers compete head-to-head on streaming applications but don’t or can’t fully understand their customers. They may have minutes of what has been streamed but they do not know the personal indicators that may impact the hours of streamed content which in context is very important. This model can also come in handy if a streaming service has advertisers trying to target a specific audience, additionally it can be used to argue where should content spending go to, it can also try to predict who is “password” sharing based on unusual streaming behavior.