Analyzing my Twitter Posts from 2017

A recent post on AEA365, plus my Evaluation Twitter working group, inspired me to finally learn how to scrape tweets in R! The AEA365 post linked to a tutorial on how to get started, which was helpful at first. However, I ran into an issue where only some of my most recent tweets were being scraped and not all of them. I ended up having to pull six waves of data and using the maxID function to grab the entire year’s worth of tweets. I then combined the dataframe and wrote the dataframe to a CSV for further analysis. I spent some time figuring out how to grab the most frequently used terms (this guide was handy) before I gave up and did everything else in Excel.

Overall, I had 996 tweets in 2017. I had about 1,071 mentions, 6,892 visits, and I was going to tell you how many new followers I have but Twitter analytics says 484 (with May having 258!) and I only have 425 so I have a feeling it’s including people who stopped following me. What happened in May?! I have no idea. Maybe some bot service found me and followed/unfollowed me? Who knows.

Most Frequently Used Terms

Not surprisingly, my most frequently used term on Twitter was “eval.” I think next I’ll analyze the #eval hashtag, but I’ll save that for another day. Also, I mention a lot of people (or maybe they are replies?) so those people are highlighted in red.

My Most Liked/Retweeted Tweets

Unfortunately, my most liked and retweeted tweet had absolutely nothing to do with eval, but it was pretty fun and exciting to have it “blow up” like it did.

Otherwise, here are some of my other most liked and/or retweeted tweets:

Overall, this was a fun little exercise! What else should I analyze?

 

 

 

R Code

If you are interested in my messy R code, here it is. The scraping was fairly straight forward, but cleaning up the text is something I had never done before and subsequently the code could probably be cleaned up.

#Load required packages
library(stringr)
library(twitteR)
library(purrr)
library(tidytext)
library(dplyr)
library(tidyr)
library(lubridate)
library(scales)
library(broom)
library(ggplot2)

#Get access to Twitter. 
#Instructions here: http://www.interhacktives.com/2017/01/25/scrape-tweets-r-journalists/
consumerKey = "INSERT"  
consumerSecret = "INSERT"
accessToken = "INSERT"
accessSecret = "INSERT"
options(httr_oauth_cache=TRUE)
setup_twitter_oauth(consumer_key = consumerKey, consumer_secret = consumerSecret,
                    access_token = accessToken, access_secret = accessSecret)

# Scrape tweets
danatweets1 <- userTimeline("danawanzer", n = 3200)
danatweets1_df <- tbl_df(map_df(danatweets1, as.data.frame))
danatweets2 <- userTimeline("danawanzer", n = 3200, maxID = 928590693353869000)
danatweets2_df <- tbl_df(map_df(danatweets2, as.data.frame))
danatweets3 <- userTimeline("danawanzer", n = 3200, maxID = 895516853119758337)
danatweets3_df <- tbl_df(map_df(danatweets3, as.data.frame))
danatweets4 <- userTimeline("danawanzer", n = 3200, maxID = 870360671313043456)
danatweets4_df <- tbl_df(map_df(danatweets4, as.data.frame))
danatweets5 <- userTimeline("danawanzer", n = 3200, maxID = 851081522169864192)
danatweets5_df <- tbl_df(map_df(danatweets5, as.data.frame))
danatweets6 <- userTimeline("danawanzer", n = 3200, maxID = 846499061079289856)
danatweets6_df <- tbl_df(map_df(danatweets6, as.data.frame))
danatweets7 <- userTimeline("danawanzer", n = 3200, maxID = 836019959239094273)
danatweets7_df <- tbl_df(map_df(danatweets7, as.data.frame))

danatweets <- rbind(danatweets1_df, 
                    danatweets2_df,
                    danatweets3_df, 
                    danatweets4_df, 
                    danatweets5_df, 
                    danatweets6_df)

write.csv(danatweets, "danatweets.csv")

# Most common words
myCorpus <- Corpus(VectorSource(danatweets$text))
removeURL <- function(x) gsub("http[^[:space:]]*", "", x)
myCorpus <- tm_map(myCorpus, content_transformer(removeURL))
removeNumPunct <- function(x) gsub("[^[:alpha:][:space:]]*", "", x)
myCorpus <- tm_map(myCorpus, content_transformer(removeNumPunct))
myStopwords <- setdiff(myCorpus, stopwords(kind = "en"))
myCorpus <- tm_map(myCorpus, content_transformer(tolower))
myCorpus <- tm_map(myCorpus, removeWords, myStopwords)

tdm <- TermDocumentMatrix(myCorpus,
                          control = list(removePunction = TRUE,
                                         stopwords = TRUE))
freq.terms <- findFreqTerms(tdm, lowfreq = 20)
term.freq <- rowSums(as.matrix(tdm))
term.freq <- subset(term.freq, term.freq >= 20)
df <- data.frame(term=names(term.freq), freq = term.freq)
df$term <- factor(df$term, levels = df$term[order(df$freq)])
write.csv(df, "freqterms.csv")

ggplot(df, aes(x=term, y=freq)) + 
  geom_bar(stat="identity") +
  xlab("Terms") + ylab("Count") + coord_flip() +
  theme(axis.text=element_text(size=7))

#what words are associated with eval
findAssocs(tdm, "eval", .2)