Method and Descriptive Results

Author

Ben Burnley & Annelise Russell

Set Up

Code
# set up 
library(tidyverse)
library(here)

# set theme 
theme_set(theme_bw())

# read in data
directory = here("output/reg_multinom/")

# get all files 
file_names = list.files(path = directory)

# full dataset 
data = tibble()

# loop to import
for (file in file_names) {
  temp_data <- read_rds(paste0(directory, file))
  
  temp_data$tweet_id <- as.character(temp_data$tweet_id)
  temp_data$party_code <- as.factor(temp_data$party_code)
  
  data <- bind_rows(data, temp_data)
}

# add variables 
data = data |> 
  mutate(majority = case_when(
    congress == 113 ~ "Democratic Majority",
    congress == 114 ~ "Republican Majority",
    congress == 115 ~ "Republican Majority",
    congress == 116 ~ "Republican Majority",
    congress == 117 ~ "Democratic Majority",
    TRUE ~ NA
  ),
  president = case_when(
    congress == 113 ~ "D",
    congress == 114 ~ "D",
    congress == 115 ~ "R",
    congress == 116 ~ "R",
    congress == 117 ~ "D"
  ))

Data and Methods

For this analysis, we’ve used every tweet from all official accounts associated with sitting United States Senators from the year 2013 through the year 2022. This comes out to 934,424 unique tweets over the decade. All tweets were collected using the now-discontinued Twitter API.

Tweets from 2013 and 2015 had been hand-coded for policy topic for a prior study. Leveraging this high-quality human coding, we trained a machine learning model that uses regularized multinomial regression to classify the existing tweet data into five distinct policy categories: domestic, economic, foreign, and social policy as well as a topic for no policy content. The model used the text of the tweet itself, the political party of the Senator, and the state of the senator as predictors for policy classification. Preprocessing steps included tokenization, stop word removal, stemming, and tf-idf transformation. For model efficiency, only the top 5,000 tokens were utilized, party affiliation and state were one hot encoded, and the final matrix was sparse encoded. For the regularization penalty hyperparameter, the model was tuned using a grid of 30 values and final value was chosen for best performance using ROC-AUC.

The 2013 data was split and stratified on policy topic. The model was trained on 75% of the hand-coded tweets, tested on the held out tweets, and then validated on the full 2015 dataset. The table below shows performance metrics across the three model implementations.

Metric 2013 Training 2013 Testing 2015 Validation
Accuracy .846 .917 .835
Recall .819 .902 .802
Specificity .959 .977 .953
F1 .830 .909 .835

Confidence matrices and ROC-AUC performance by policy topic are provided in the appendix. This model was then applied to the remaining data to obtain estimated policy topics.

Results

Policy Topics Over Time

Code
data |> 
  group_by(date) |> 
  count(policy_area) |> 
  mutate(year = year(date)) |> 
  ggplot(aes(date, n, color = policy_area))+
  geom_smooth(se = F)+
  labs(
    x = NULL,
    y = "Estimated Daily Count",
    color = "Policy Topic"
  )+
  theme(
    legend.position = "bottom"
  )

This plot shows the estimated daily counts of each kind of policy tweets over the last decade. Tweets that have no policy content, here “None” are the most common and across the decade, but seem to be more prevalent during the Trump Presidency (2017-2020). There also appear to be peaks in certain policy topics depending on election cycle. For example, social policy has two defined peaks in 2017 and 2019, two years in which no elections are held. Economic policy and foreign policy see similar peaks in the same years, respectively.

Most Common Policy Topic By Year

Code
data |> 
  group_by(year) |> 
  count(policy_area) |> 
  filter(policy_area != "None") |> 
  ggplot(aes(policy_area, n, fill = policy_area))+
  geom_col()+
  coord_flip()+
  facet_wrap(~year)+
  labs(
    x = NULL
  )+
  theme(
    legend.position = "none"
  )

The plot above shows the total number of each kind of policy tweet by year. For every year except 2015, social policy is the most common policy topic, followed by domestic policy. Foreign and economic policy make up a much smaller share of the total policy tweets. There don’t appear to be any strong election year effects. As mentioned above, social policy is most prominent during the Trump presidency.

Most Common Policy Topic by Congress

Code
data |> 
  group_by(congress) |> 
  count(policy_area) |> 
  filter(policy_area != "None") |> 
  ggplot(aes(policy_area, n, fill = policy_area))+
  geom_col()+
  coord_flip()+
  facet_wrap(~congress)+
  labs(
    x = NULL
  )+
  theme(
    legend.position = "none"
  )

This plot shows a similar result to the prior plot, now shown by Congress instead of year. Simiar trends hold from above.

Changes in Policy Topic by Party

Total Tweet Frequency

Code
data |> 
  mutate(party = if_else(party_code == 100, "Dem", "Rep")) |> 
  group_by(date, party) |> 
  count() |> 
  ggplot(aes(date, n, color = party))+
  geom_smooth(se = F)+
  scale_color_manual(values = c("dodgerblue", "red"))+
  labs(
    x = NULL,
    y = "Estimated Daily Count",
    color = NULL
  )+
  theme(
    legend.position = "bottom"
  )

First, just a look at the total number of tweets per day from Senators. As previous research has shown, Democrats are more active than Republicans across the majority of the data. Democrats increase their tweeting in the lead-up to and the first year of the Trump presidency. Similarly, Republicans increase their tweeting in the lead up to and during the Biden presidency. Together, this highlights the role that Twitter can play as a tool for the opposition.

Domestic Policy

Code
data |> 
  mutate(party = if_else(party_code == 100, "Dem", "Rep")) |> 
  group_by(date, party) |> 
  count(policy_area) |> 
  filter(policy_area == "Domestic Policy") |> 
  ggplot(aes(date, n, color = party))+
  geom_smooth(se = F)+
  scale_color_manual(values = c("dodgerblue", "red"))+
  annotate(geom = "rect", xmin = as.POSIXct("2013-01-01"), xmax = as.POSIXct("2014-12-31") , ymin = 35, ymax = 36, alpha = 0.25, fill = "dodgerblue")+
  annotate(geom = "rect", xmin = as.POSIXct("2015-01-01"), xmax = as.POSIXct("2016-12-31") , ymin = 35, ymax = 36, alpha = 0.25, fill = "dodgerblue")+
  annotate(geom = "rect", xmin = as.POSIXct("2017-01-01"), xmax = as.POSIXct("2018-12-31") , ymin = 35, ymax = 36, alpha = 0.25, fill = "red")+
  annotate(geom = "rect", xmin = as.POSIXct("2019-01-01"), xmax = as.POSIXct("2020-12-31") , ymin = 35, ymax = 36, alpha = 0.25, fill = "red")+
  annotate(geom = "rect", xmin = as.POSIXct("2021-01-01"), xmax = as.POSIXct("2022-12-31") , ymin = 35, ymax = 36, alpha = 0.25, fill = "dodgerblue")+
  annotate(geom = "rect", xmin = as.POSIXct("2013-01-01"), xmax = as.POSIXct("2014-12-31") , ymin = 34, ymax = 35, alpha = 0.25, fill = "dodgerblue")+
  annotate(geom = "rect", xmin = as.POSIXct("2015-01-01"), xmax = as.POSIXct("2016-12-31") , ymin = 34, ymax = 35, alpha = 0.25, fill = "red")+
  annotate(geom = "rect", xmin = as.POSIXct("2017-01-01"), xmax = as.POSIXct("2018-12-31") , ymin = 34, ymax = 35, alpha = 0.25, fill = "red")+
  annotate(geom = "rect", xmin = as.POSIXct("2019-01-01"), xmax = as.POSIXct("2020-12-31") , ymin = 34, ymax = 35, alpha = 0.25, fill = "red")+
  annotate(geom = "rect", xmin = as.POSIXct("2021-01-01"), xmax = as.POSIXct("2022-12-31") , ymin = 34, ymax = 35, alpha = 0.25, fill = "dodgerblue")+
  annotate(geom = "text", y = 35.5, x = as.POSIXct("2013-09-01"), label = "Presidency", color = "darkgray")+
  annotate(geom = "text", y = 34.5, x = as.POSIXct("2013-06-07"), label = "Senate", color = "darkgray")+
  labs(
    x = NULL,
    y = "Estimated Daily Count",
    color = NULL,
    caption = "Colored bars at top show which party held White House and Senate Majority"
  )+
  theme(
    legend.position = "bottom"
  )

This plot looks at the changes in domestic policy tweets over the decade. Democrats start tweeting more about domestic policy once they lose control of the Senate and this increases into the Trump presidency. The Biden administration and unified government from 2021-2022 brings domestic policy to its highest level for Democrats. Republicans turn to tweeting about domestic policy primarily in the second half of the Trump presidency and keep this level through the end of the data.

Social Policy

Code
data |> 
  mutate(party = if_else(party_code == 100, "Dem", "Rep")) |> 
  group_by(date, party) |> 
  count(policy_area) |> 
  filter(policy_area == "Social Policy") |> 
  ggplot(aes(date, n, color = party))+
  geom_smooth(se = F)+
  scale_color_manual(values = c("dodgerblue", "red"))+
  annotate(geom = "rect", xmin = as.POSIXct("2013-01-01"), xmax = as.POSIXct("2014-12-31") , ymin = 57, ymax = 59, alpha = 0.25, fill = "dodgerblue")+
  annotate(geom = "rect", xmin = as.POSIXct("2015-01-01"), xmax = as.POSIXct("2016-12-31") , ymin = 57, ymax = 59, alpha = 0.25, fill = "dodgerblue")+
  annotate(geom = "rect", xmin = as.POSIXct("2017-01-01"), xmax = as.POSIXct("2018-12-31") , ymin = 57, ymax = 59, alpha = 0.25, fill = "red")+
  annotate(geom = "rect", xmin = as.POSIXct("2019-01-01"), xmax = as.POSIXct("2020-12-31") , ymin = 57, ymax = 59, alpha = 0.25, fill = "red")+
  annotate(geom = "rect", xmin = as.POSIXct("2021-01-01"), xmax = as.POSIXct("2022-12-31") , ymin = 57, ymax = 59, alpha = 0.25, fill = "dodgerblue")+
  annotate(geom = "rect", xmin = as.POSIXct("2013-01-01"), xmax = as.POSIXct("2014-12-31") , ymin = 55, ymax = 57, alpha = 0.25, fill = "dodgerblue")+
  annotate(geom = "rect", xmin = as.POSIXct("2015-01-01"), xmax = as.POSIXct("2016-12-31") , ymin = 55, ymax = 57, alpha = 0.25, fill = "red")+
  annotate(geom = "rect", xmin = as.POSIXct("2017-01-01"), xmax = as.POSIXct("2018-12-31") , ymin = 55, ymax = 57, alpha = 0.25, fill = "red")+
  annotate(geom = "rect", xmin = as.POSIXct("2019-01-01"), xmax = as.POSIXct("2020-12-31") , ymin = 55, ymax = 57, alpha = 0.25, fill = "red")+
  annotate(geom = "rect", xmin = as.POSIXct("2021-01-01"), xmax = as.POSIXct("2022-12-31") , ymin = 55, ymax = 57, alpha = 0.25, fill = "dodgerblue")+
  annotate(geom = "text", y = 58, x = as.POSIXct("2013-09-01"), label = "Presidency", color = "darkgray")+
  annotate(geom = "text", y = 56, x = as.POSIXct("2013-06-07"), label = "Senate", color = "darkgray")+
  labs(
    x = NULL,
    y = "Estimated Daily Count",
    color = NULL,
    caption = "Colored bars at top show which party held White House and Senate Majority"
  )+
  theme(
    legend.position = "bottom"
  )

This plot shows the changes in social policy over the decade. Immediately a peak in 2017 jumps out for Democrats, that is linked to the Trump administration’s attempt to repeal the Affordable Care Act. Democrats responded by increasing the amount the talked about healthcare during this period. Similar to domestic policy, Republicans had an increase in the amount they tweeted about this in late 2019 that has sustained, for the most part, through the end of the dataset.

Foreign Policy

Code
data |> 
  mutate(party = if_else(party_code == 100, "Dem", "Rep")) |> 
  group_by(date, party) |> 
  count(policy_area) |> 
  filter(policy_area == "Foreign Policy") |> 
  ggplot(aes(date, n, color = party))+
  geom_smooth(se = F)+
  scale_color_manual(values = c("dodgerblue", "red"))+
  annotate(geom = "rect", xmin = as.POSIXct("2013-01-01"), xmax = as.POSIXct("2014-12-31") , ymin = 30, ymax = 32, alpha = 0.25, fill = "dodgerblue")+
  annotate(geom = "rect", xmin = as.POSIXct("2015-01-01"), xmax = as.POSIXct("2016-12-31") , ymin = 30, ymax = 32, alpha = 0.25, fill = "dodgerblue")+
  annotate(geom = "rect", xmin = as.POSIXct("2017-01-01"), xmax = as.POSIXct("2018-12-31") , ymin = 30, ymax = 32, alpha = 0.25, fill = "red")+
  annotate(geom = "rect", xmin = as.POSIXct("2019-01-01"), xmax = as.POSIXct("2020-12-31") , ymin = 30, ymax = 32, alpha = 0.25, fill = "red")+
  annotate(geom = "rect", xmin = as.POSIXct("2021-01-01"), xmax = as.POSIXct("2022-12-31") , ymin = 30, ymax = 32, alpha = 0.25, fill = "dodgerblue")+
  annotate(geom = "rect", xmin = as.POSIXct("2013-01-01"), xmax = as.POSIXct("2014-12-31") , ymin = 28, ymax = 30, alpha = 0.25, fill = "dodgerblue")+
  annotate(geom = "rect", xmin = as.POSIXct("2015-01-01"), xmax = as.POSIXct("2016-12-31") , ymin = 28, ymax = 30, alpha = 0.25, fill = "red")+
  annotate(geom = "rect", xmin = as.POSIXct("2017-01-01"), xmax = as.POSIXct("2018-12-31") , ymin = 28, ymax = 30, alpha = 0.25, fill = "red")+
  annotate(geom = "rect", xmin = as.POSIXct("2019-01-01"), xmax = as.POSIXct("2020-12-31") , ymin = 28, ymax = 30, alpha = 0.25, fill = "red")+
  annotate(geom = "rect", xmin = as.POSIXct("2021-01-01"), xmax = as.POSIXct("2022-12-31") , ymin = 28, ymax = 30, alpha = 0.25, fill = "dodgerblue")+
  annotate(geom = "text", y = 31, x = as.POSIXct("2013-09-01"), label = "Presidency", color = "darkgray")+
  annotate(geom = "text", y = 29, x = as.POSIXct("2013-06-07"), label = "Senate", color = "darkgray")+
  labs(
    x = NULL,
    y = "Estimated Daily Count",
    color = NULL,
    caption = "Colored bars at top show which party held White House and Senate Majority"
  )+
  theme(
    legend.position = "bottom"
  )

Looking at foreign policy, we see the first policy topic where Republicans out-tweet Democrats for the entire sample. In general, the two trend lines move together indicating that the salience of foreign policy events may drive foreign policy communication more than partisan differences.

Economic Policy

Code
data |> 
  mutate(party = if_else(party_code == 100, "Dem", "Rep")) |> 
  group_by(date, party) |> 
  count(policy_area) |> 
  filter(policy_area == "Economic Policy") |> 
  ggplot(aes(date, n, color = party))+
  geom_smooth(se = F)+
  scale_color_manual(values = c("dodgerblue", "red"))+
  annotate(geom = "rect", xmin = as.POSIXct("2013-01-01"), xmax = as.POSIXct("2014-12-31") , ymin = 30, ymax = 32, alpha = 0.25, fill = "dodgerblue")+
  annotate(geom = "rect", xmin = as.POSIXct("2015-01-01"), xmax = as.POSIXct("2016-12-31") , ymin = 30, ymax = 32, alpha = 0.25, fill = "dodgerblue")+
  annotate(geom = "rect", xmin = as.POSIXct("2017-01-01"), xmax = as.POSIXct("2018-12-31") , ymin = 30, ymax = 32, alpha = 0.25, fill = "red")+
  annotate(geom = "rect", xmin = as.POSIXct("2019-01-01"), xmax = as.POSIXct("2020-12-31") , ymin = 30, ymax = 32, alpha = 0.25, fill = "red")+
  annotate(geom = "rect", xmin = as.POSIXct("2021-01-01"), xmax = as.POSIXct("2022-12-31") , ymin = 30, ymax = 32, alpha = 0.25, fill = "dodgerblue")+
  annotate(geom = "rect", xmin = as.POSIXct("2013-01-01"), xmax = as.POSIXct("2014-12-31") , ymin = 28, ymax = 30, alpha = 0.25, fill = "dodgerblue")+
  annotate(geom = "rect", xmin = as.POSIXct("2015-01-01"), xmax = as.POSIXct("2016-12-31") , ymin = 28, ymax = 30, alpha = 0.25, fill = "red")+
  annotate(geom = "rect", xmin = as.POSIXct("2017-01-01"), xmax = as.POSIXct("2018-12-31") , ymin = 28, ymax = 30, alpha = 0.25, fill = "red")+
  annotate(geom = "rect", xmin = as.POSIXct("2019-01-01"), xmax = as.POSIXct("2020-12-31") , ymin = 28, ymax = 30, alpha = 0.25, fill = "red")+
  annotate(geom = "rect", xmin = as.POSIXct("2021-01-01"), xmax = as.POSIXct("2022-12-31") , ymin = 28, ymax = 30, alpha = 0.25, fill = "dodgerblue")+
  annotate(geom = "rect", xmin = as.POSIXct("2013-01-01"), xmax = as.POSIXct("2014-12-31") , ymin = 26, ymax = 28, alpha = 0.25, fill = "red")+
  annotate(geom = "rect", xmin = as.POSIXct("2015-01-01"), xmax = as.POSIXct("2016-12-31") , ymin = 26, ymax = 28, alpha = 0.25, fill = "red")+
  annotate(geom = "rect", xmin = as.POSIXct("2017-01-01"), xmax = as.POSIXct("2018-12-31") , ymin = 26, ymax = 28, alpha = 0.25, fill = "red")+
  annotate(geom = "rect", xmin = as.POSIXct("2019-01-01"), xmax = as.POSIXct("2020-12-31") , ymin = 26, ymax = 28, alpha = 0.25, fill = "dodgerblue")+
  annotate(geom = "rect", xmin = as.POSIXct("2021-01-01"), xmax = as.POSIXct("2022-12-31") , ymin = 26, ymax = 28, alpha = 0.25, fill = "dodgerblue")+
  annotate(geom = "text", y = 31, x = as.POSIXct("2013-09-01"), label = "Presidency", color = "darkgray")+
  annotate(geom = "text", y = 29, x = as.POSIXct("2013-06-07"), label = "Senate", color = "darkgray")+
  annotate(geom = "text", y = 27, x = as.POSIXct("2013-06-07"), label = "House", color = "darkgray")+
  labs(
    x = NULL,
    y = "Estimated Daily Count",
    color = NULL,
    caption = "Colored bars at top show which party held White House, Senate, and House Majority"
  )+
  theme(
    legend.position = "bottom"
  )

Turning now to economic policy, we have the first policy topic where the parties trade places multiple times in terms of who communicates about the topic more frequently. I’ve added the House majority to this plot as well. Democrats talk the most about the economy during the start of Republican control of both chambers and the White House. Republicans follow suit, increasing their economic policy tweeting during the start of the Biden presidency. Republicans start to outpace Democrats on this topic in 2019 though, the year after they lose their trifecta. Next steps should include examining whether divided government increases this policy topic more so than other topics.

No Policy Content

Code
data |> 
  mutate(party = if_else(party_code == 100, "Dem", "Rep")) |> 
  group_by(date, party) |> 
  count(policy_area) |> 
  filter(policy_area == "None") |> 
  ggplot(aes(date, n, color = party))+
  geom_smooth(se = F)+
  scale_color_manual(values = c("dodgerblue", "red"))+
  annotate(geom = "rect", xmin = as.POSIXct("2013-01-01"), xmax = as.POSIXct("2014-12-31") , ymin = 87, ymax = 90, alpha = 0.25, fill = "dodgerblue")+
  annotate(geom = "rect", xmin = as.POSIXct("2015-01-01"), xmax = as.POSIXct("2016-12-31") , ymin = 87, ymax = 90, alpha = 0.25, fill = "dodgerblue")+
  annotate(geom = "rect", xmin = as.POSIXct("2017-01-01"), xmax = as.POSIXct("2018-12-31") , ymin = 87, ymax = 90, alpha = 0.25, fill = "red")+
  annotate(geom = "rect", xmin = as.POSIXct("2019-01-01"), xmax = as.POSIXct("2020-12-31") , ymin = 87, ymax = 90, alpha = 0.25, fill = "red")+
  annotate(geom = "rect", xmin = as.POSIXct("2021-01-01"), xmax = as.POSIXct("2022-12-31") , ymin = 87, ymax = 90, alpha = 0.25, fill = "dodgerblue")+
  annotate(geom = "rect", xmin = as.POSIXct("2013-01-01"), xmax = as.POSIXct("2014-12-31") , ymin = 90, ymax = 93, alpha = 0.25, fill = "dodgerblue")+
  annotate(geom = "rect", xmin = as.POSIXct("2015-01-01"), xmax = as.POSIXct("2016-12-31") , ymin = 90, ymax = 93, alpha = 0.25, fill = "red")+
  annotate(geom = "rect", xmin = as.POSIXct("2017-01-01"), xmax = as.POSIXct("2018-12-31") , ymin = 90, ymax = 93, alpha = 0.25, fill = "red")+
  annotate(geom = "rect", xmin = as.POSIXct("2019-01-01"), xmax = as.POSIXct("2020-12-31") , ymin = 90, ymax = 93, alpha = 0.25, fill = "red")+
  annotate(geom = "rect", xmin = as.POSIXct("2021-01-01"), xmax = as.POSIXct("2022-12-31") , ymin = 90, ymax = 93, alpha = 0.25, fill = "dodgerblue")+
  annotate(geom = "text", y = 91.5, x = as.POSIXct("2013-09-01"), label = "Presidency", color = "darkgray")+
  annotate(geom = "text", y = 88.5, x = as.POSIXct("2013-06-07"), label = "Senate", color = "darkgray")+
  labs(
    x = NULL,
    y = "Estimated Daily Count",
    color = NULL,
    caption = "Colored bars at top show which party held White House and Senate Majority"
  )+
  theme(
    legend.position = "bottom"
  )

Finally, Republicans and Democrats trade places on the frequency with which they tweet about topics that do not fit within our coding scheme and lack policy content. There does not appear to be clear indication that this is related to party control over the Senate or the Presidency.

Ideology and Policy Topic

Code
data |> 
  mutate(party_code = if_else(party_code == "328", "100", party_code)) |> 
  mutate(party = if_else(party_code == 100, "Dem", "Rep")) |> 
  group_by(bioname, policy_area, nominate_dim1, party) |> 
  count() |> 
  ggplot(aes(nominate_dim1, n, color = party))+
  geom_point(alpha = 0.5)+
  scale_color_manual(values = c("dodgerblue", "red"), guide = "none")+
  geom_smooth(method = "lm", se = F)+
  facet_wrap(~policy_area, scales = "free")+
  labs(
    x = "DW-NOMINATE First Dimension",
    y = "Total Number of Tweets"
  )

Lastly, a quick look at how ideology might influence what policy and how frequently a member of Congress tweets about a certain policy topic. Here, each member in the data set is plotted by their DW-NOMINATE first dimension score and the total number of tweets about that policy area. In each facet, linear regression lines are plotted to show whether intra-party differences lead to different levels of tweeting about that topic. For Democrats, these lines typically slope downward meaning that more liberal senators (lower DW-NOMINATE scores) are more likely to tweet about policy. This result isn’t mirrored in Republicans. In fact, more moderate members of the party are slightly more likely to tweet about policy than more extreme members.

Appendix

2013 Training Confidence Matrix

2015 Validation Confidence Matrix

ROC-AUC Plots