DataEngConf: Talkographics: Using What Viewers Say Online to Measure TV and Brand Audiences at Microsoft

Shawndra Hill
Microsoft Research NYC
DataEngConf
Nov 14, 2015
Talkographics:
Using What Viewers Say Online to Calculate Brand and TV
Affinity Networks
www.thesocialtvlab.com
#thesocialtvlab

Tools Enable Fast Connectivity to Shows and Fans

Why Should We Care?

Event Studies Extracting Business Value of Social Media
Social Media TV Triggers
TV Advertising
Sporting Events
Political Events
Social Media-based Recommendation Engine
Predicting TV Show Socialness/Success
Predicting TV Show Viewership
Calculating Customer Lifetime Value
I track and monitor over 1000 shows on Social Media

What you say
(on Twitter),
says a lot about you …

What groups say
(on Twitter),
says a lot about them …

Main Contributions
1. Data: A novel data collection approach that enables
both training and testing social media-based
recommendation systems from publicly available data
2. Approach: A new user generated content
recommendation approach that capitalizes on the
content viewers contribute in public for free on social
media. The approach can compliment other product
network-based methods/recommendation.
3. Explanatory power: We demonstrate that the
approach reflects demographics, interests and
geographics, and outperforms aggregate-based
demographics (among other reasonable baselines) for
making TV show (and product) recommendations on
Twitter

Our Recommendation
System “Setup”

Data Collection
Twitter Handles of TV Shows
(572 TV Shows)
TV Show Followers
(~19 million)
(sampled to ~114K)
Followers’
Followers and
Friends
Followers’
Tweets
#blah
#blah
#blah

Related Work
(Marketing)
These papers all use Social
Media/User Generated Content in
or from recommendations/reviews
Ghose et al. ranked hotels for a user
using features/amenities from
reviews
Netzer et al. Calculated associations
between companies using blog data.
Preset list of terms. Company
names, features of cars, etc.
Lee and Bradlow automated feature
extraction from customer reviews,
focus on the features of the
products that people care about.

Related Work
(CS)
These papers all use ”Digital Data”
to predict individual level
demographics.
--Clicks
--Blogs/Text
--Clicks + Time Spent
--Search Queries
“Predicting” demographics is an
important problem for business

Related Work
These papers all use Social
Media/User Generated Content
Link INDIVIDUAL level data
To INDIVIDUAL level demographics
Ontology, Vocabulary, Lexicon Free

Differences from Prior Work
1. We use Twitter text and networks in a “clever” way to evaluate
different recommendation strategies (not just text-based)
2. No preset ontology (open vocabulary) -- don’t need to decide how to
represent items – all items are represented the same way. Therefore
the approach is flexible/generalizable to any domain. As a result, we
can easily make across category predictions.
3. Use more than just co-occurrence with brand/TV Show mentions.
Note: brands and TV shows are rarely mentioned together on Twitter,
therefore one would need a huge data set to find any signal.
4. By combining groups of tweets with publicly available “survey” data, we
can (try to) predict both demographics and interests of groups of users
–which is useful for many business problems not just recommender
systems
5. Use aggregate level data to build models – privacy friendly

Data:
Assuming We Don’t Know
Individual Level (PII)
Data

Data: TV show follower network
S1 S2
S3
Over 19 million unique followers
S1 – American Idol
S2 – The Voice
S3 – Duets

Data: Sampled followers
• Identified all users who followed >= 2 shows (≈
5.5 million)
• Randomly sampled up to 1,000 in each show’s
local network
S1 S2
S3
vw
u S1 – American Idol
S2 – The Voice
S3 – Duets
u – Shawndra
v – Adrian
w – Christophe

Data: Status updates
• Collected up to the past 400 tweets from each user in
follower sample
• Each tweet is randomly assigned to one show the user
follows
• Removed user u if
– language != en
– |Followers(u)| > 2000
Sample of 114K users

Data: TV show content features
• Scraped from IMDB TV pages
• Features:
– Years in production
– Content rating (e.g., PG)
– Genre keywords
– Length of episode (minutes)
– Average user rating
– Number of user ratings
– Number of user reviews
– Number of critic reviews
– Producers
– Actors
– Plot keywords
– Country of production
– Languages
– Network channel

Models: Overview
• Network models
– Product/show network confidence
– Follower social network *
– Network popularity
• Show follower feature models
– Gender
– Location
– General demographics-based
• User-generated text model
– TF-IDF transform on all words less show related words
– TF-IDF transform on all words
– TF-IDF on show related words
– Co-occurance of show names
– Bigrams
• Show feature model
– Show content similarity
• Matrix factorization
• Random
– Randomly selected shows
– Randomly selected words

Evaluation framework
function VALIDATE(Engine e, List[Set[user]] tests,
List[Set[user]] trains) {
List[Result] results = [];
FOR (i IN 1:10) {
Model m = TRAIN(e, trains[i]);
FOR (u IN tests[i]) {
Show randShow = GET_RANDOM_SHOW(u)
List[Show] recommended = PREDICT(m, u, randShow)
results += GET_PERFORMANCE(recommended, u, randShow)
}
}
RETURN (SUM(results)/10);
}
10-fold cross validation over 114K users

Models: Show network confidence
• Compute similarity matrix between all shows
based on confidence from show network
S1 S2
v
u
S2 – The Voice
u – Shawndra
v – Adrian
F1  F2
F1
F2
SIM (Fx , Fy ) = |Fx  Fy|/ Fx
Also applied filter on support of F1  F2 >= 10.
Varying this threshold seemed to have little effect on performance.

Models: Network popularity
• Simply rank by number of Twitter followers
• Ignores features of input user and show
S1 S2 S3
Ranking: S2, S3, S1
S2 – The Voice
S3 – Duets

Models: Text-based pre-processing
• For each show, collect all tweets posted by
followers.
• *Remove tweets that include show names and
hashtags
• Remaining text is tokenized, removing Twitter
handles/Twitter-specific tokens (e.g., RT) and a
“bag of words” count vector is constructed for
each show
• Counts are transformed by:
TF-IDF (cx , t ) = cx (t)/|{y|cy(t)>0}|

Models: Text-based similarity
Similarity calculated using cosine similarity
FOR (x,y) in (SHOWS X SHOWS)
SIM (x,y ) = vx . vy/|vx||vy|
Where vx . vy are the TF-IDF transformed
bag of word vectors for shows x and y

Models: Text-based note
• The follower tweets are not necessarily about the
television shows they follow
• May capture a more general representation of a
show’s follower base

Results: Text-based English only
• Only restricting to standard English words results in similar level of performance
• 4 million  40,000 tokens
• Also restricted to non show tweets only and got about the same performance

Why Our Text-Based
Approach Works

Results: TFIDF Token rank per show
Captures qualities of the show as well as of the fan base
american
idol
amsales
girls
colbert
report
ru paul’s
drag
race
thunder
cats now
beavis
and
butthead
idol bridal petition gay samurai f**k
birthday wedding bullying lesbian marvel s**t
snugs gown newt drag barbarian f**king
god bride republican equality cyborg loco
recap curvy tax marriage batman b**h
finale meditation president maternal comic ass
bullying fortune f**k cuckoo wars hate
love coziness debate s**t watchmen damn
excited respectable freedom b***h spiderman smoke
happy hopefulness unsigned jewelry extermination stupid

Demographic type Demographic categories
Gender male, female
Age < 17 yrs, 18-20 yrs, 21-24 yrs, 25-34 yrs, 35-49 yrs,
50-54 yrs, 55-64 yrs, > 65 yrs
Hispanic hispanic
Parents parents (have children of any age)
Education level in high school, in college, graduated college
Demographics

Linking Words to Categories
For each of these proportion dependent variables, we ran a simple linear regression linking
the frequency each of the tokens in all the shows' English-only bag of words vectors (i.e.,
proportion times token occurred in show bag of words over total number of tokens in the
bag of words) to their proportion demographic dependent variables. p_{dem} = w_i * t_i +
w_{0i}, solving for w_i, w_{0i}. After running these regressions, only those tokens which
were positively correlated with the demographic dependent variable were kept, and those
were ranked by R^2 value, in descending order.
Show ID Word 1
(love)
Word 2
(school)
Word N
(work)
Prop
Female
1 4 6 .5
2 1 5 .7

High Proportion Female vs. Male

proportion
female
proportion
male
proportion <
17 yrs old
proportion
21-24 yrs old
proportion
25-34 yrs old
proportion
35-49 yrs old
proportion
parents
proportion
college grads
love (0.38) game (0.19) ariana (0.24) f*** (0.11) work (0.09) great (0.21) hubby (0.19) gop (0.19)
beautiful
(0.21) league (0.17) school (0.23)
f***ing
(0.10)
women
(0.09) service (0.17)
morning
(0.15) office (0.18)
cute (0.20) hulk (0.14) liam (0.20) b**** (0.07) daily (0.08)
taxpayer
(0.14)
blessed
(0.14)
political
(0.18)
happy (0.18)
battlefield
(0.13)
direction
(0.20) s*** (0.07)
husband
(0.08) market (0.13)
husband
(0.11) media (0.17)
amazing
(0.16) comic (0.12)
victorious
(0.19) hate (0.06) lounge (0.08) pres (0.13) family (0.10) daily (0.17)
miss (0.15) players (0.12) follow (0.18)
boyfriend
(0.05) hire (0.08) wine (0.12) day (0.10) st (0.17)
mom (0.13) wars (0.12)
awkward
(0.17) song (0.05) st (0.08) recipe (0.12) loving (0.10) cc (0.16)
heart (0.13) beer (0.12) harry (0.15) tenia (0.05)
interested
(0.08) media (0.12) pray (0.09) pres (0.16)
loving (0.13)
batman
(0.11) jonas (0.15) bored (0.05) drinks (0.07)
political
(0.12) bless (0.09) service (0.15)
smile (0.13) shot (0.11) bored (0.13) n**** (0.05)
keeping
(0.07) wealth (0.12) happy (0.09)
homeland
(0.15)
girl (0.13) zombie (0.11)te (0.13)
bandsaw
(0.05)
homeland
(0.06) coffee (0.12) prayer (0.09) route (0.14)
Demographic Categories

Geographic Categories
north south
oread (0.08) blessed (0.12)
rathskeller (0.08) interjection (0.10)
naqua (0.08) redouble (0.10)
littre (0.08) god (0.10)
hopkinson (0.08) birdseed (0.09)
squiffy (0.08) rachet (0.09)
porcine (0.07) dis (0.09)
psilocybin (0.07) shuffler (0.09)
cloisonne (0.07) nonjudgmental (0.09)
cloaca (0.07) americus (0.07)
comber (0.07) prayerful (0.07)
eero (0.06) boo (0.07)
saarinen (0.06) fineness (0.07)

Interests
cooking gardening travelling pop_culture
preservative (0.08) great (0.11) gop (0.10) love (0.18)
oafish (0.07) recipe (0.11) bistro (0.10) liam (0.15)
crockery (0.07) lots (0.09) candidate (0.10) direction (0.14)
terrine (0.07) market (0.09) latest (0.09) boyfriend (0.13)
cherimoya (0.07) puree (0.09) neil (0.09) awkward (0.13)
food (0.06) organic (0.09) campaign (0.09) hate (0.13)
restaurateur (0.06) dinner (0.09) government (0.08) school (0.12)
irrevocably (0.06) enjoy (0.09) reference (0.08) girl (0.12)
compote (0.06) meditation (0.08) pilot (0.08) follow (0.12)
padus (0.06) handmade (0.08) film (0.08) malik (0.11)

Multidimensional Demographics/Interests
Young Old
Female love, direction, girl, cute,
malik, boyfriend, liam,
awkward, hate, school,
Eleanor, follow, moment,
swaggie, sister, harry,
amazing, song, ariana,
mom
great, hubby, recipe,
service, healthy,
handmade, morning,
wonderful, dinner, savory,
casserole, blessed,
meade, prayer, scallop,
discipline, coffee, market,
cardamom, foodie
Male dude, game, battlefield,
leagye, zombie, cunt,
batman, cyborg, metal,
silva, play, megadeath,
gaming, comic, icehouse,
hulks, fucking, ops, miller,
beer
war, game, league, hulk,
field, newt, players, devils,
occupy, conservative,
officials, column, analyst,
pitch, comedy, political,
pentagon, striker, shark,
jones, tactical

Liberal Conservative
Female bachelorette, hubby,
amazing, umbria,
monogram, happy, floral,
excited, silhouette, love,
yay, braid, batch, yummy,
cute, dixie, capiz, nape,
idol, rochelle
evelyn, blessed,
interjection, morning,
redouble, god, braxton,
thirdly, boo, Zambian,
scallion, nonjudgemental,
adverb, salaried,
transferee, yaw, rachet,
benet, love, authentically
Male tactical, game, battlefield,
league, ops, survival,
players, midfield, fullback,
warfare, mangold,
anthropomorphic,
hornblower, agitating,
theorize, driveshaft,
feasibly, toklas, argot,
comedy, hulk, coxswain,
comic, inaudible,
automatism, marsupium,
stenosis, pitchfork, game,
hockey, duty, shot,
preseason, concervative,
tourney, championship,
war, strikeout, saints

Predicting proportions using linear regression (R-sq reported)
Top 3 word tokens
Female Young female Old female
Female 0.38 0.33 0.12
Young female 0.41 0.44 0.25
Old female 0.05 0.11 0.31
Top 5 tokens
Male
Conservative
male Liberal male
Male 0.40 0.36 0.34
Conservative
male 0.38 0.65 0.19
Liberal male 0.40 0.15 0.74
Predicting Demographic Proportions Using 10-Fold Cross Validation
Compared to What?

Demographic Text Features Drive Results

Twitter, Facebook, Experian
TV Show Gender
Specific
Word
Scores
(Love,
Beautiful,
Cute)
Facebook Proportions Actual
Proportion
of People
Watching
Show and
on Twitter
Actual
Proportions
Show 1 20 .4 .2 .3
Show 2 50 .3 .1 .1
Show 3 80 .2 .3 .1
Show 4 500 .1 .4 .3
Show 5 300 .1 .5 .04
Show 6 200 .1 .3 .1
…

Proportion Facebook, Actual
Demographic Actual ~ Facebook Actual ~ Facebook + Words
Gender 0.57 0.62
Age 0.40 0.43
College Education 0.08 0.29
Percent Hispanic 0.33 0.47

Why Is the Approach Good?
• Performs well for TV Shows/Brands with fewer links
• Performs well for more engaged “Twitter” users
• Performs well for niche shows
• Learning curves indicate outperforms with fewer training examples -- across all
shows -- for lower tier shows no solution, also learning curves suggest we need
few samples per show to generate significant relationships between words and
demographics
• Linking to “Aggregate” level demographics Is straightforward and interpretable
• Does much better at cross category predictions (On a larger, different data set that
I wont have time to show here but we can discuss offline)

Additional Product Categories
Dataset # seed
handles
# unique
followers
# users in
training/test folds
# tweets from in-fold users
Auto 42 1789399 68516 14912886
Clothing 83 8856664 110847 26993874

Compliments or Substitutes (Input)?
KL Divergence

Results: Popular Versus Niche (Output)

Compliments or Substitutes?
(Input)
All Shows, Bottom 50%, Bottom 25%
Based on TV Show Followers

So Where Exactly is the
Value
from (Social) Data for
TV and Brands?
What are the
Applications?

Super Bowl 2014 - Coke
credit: theexaminer.com

Calculating Similarity Between Items
(or Audiences)

For any subgroup of
Tweeters,
we can predict
demographics and
interests,
based on the words they
use.

Main Contributions
1. Data: A novel data collection approach that enables
both training and testing social media-based
recommendation systems from publicly available data
2. Approach: A new user generated content
recommendation approach that capitalizes on the
content viewers contribute in public for free on social
media. The approach can compliment other product
network-based methods.
3. Explanatory power: We demonstrate that the
approach reflects demographics, interests and
geographics, and outperforms aggregate-based
demographics (among other reasonable baselines) for
making TV show (and product) recommendations

DataEngConf: Talkographics: Using What Viewers Say Online to Measure TV and Brand Audiences at Microsoft

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (20)

Similar to DataEngConf: Talkographics: Using What Viewers Say Online to Measure TV and Brand Audiences at Microsoft

Similar to DataEngConf: Talkographics: Using What Viewers Say Online to Measure TV and Brand Audiences at Microsoft (20)

More from Hakka Labs

More from Hakka Labs (14)

Recently uploaded

Recently uploaded (20)

DataEngConf: Talkographics: Using What Viewers Say Online to Measure TV and Brand Audiences at Microsoft

Editor's Notes