Analyzing American Time Use Surveys in R
An Example: Mothers’ Time Use by Marital Status
Joanna R. Pepin
23 December, 2020
Getting the data
This tutorial uses data downloaded from IPUMS’ ATUS-X extract builder. The data includes samples from 2003 - 2019.
Variables:
Type | Variable | Label |
---|---|---|
H | RECTYPE | Record Type |
H | YEAR | Survey year |
H | CASEID | ATUS Case ID |
H | REGION | Region |
H | HH_SIZE | Number of people in household |
H | HH_CHILD | Children under 18 in household |
H | AGEYCHILD | Age of youngest household child |
H | HH_NUMADULTS | Number of adults in household |
H | HHTENURE | Living quarters owned, rented, or occupied without rent |
P | PERNUM | Person number (general) |
P | LINENO | Person line number |
P | DAY | ATUS interview day of the week |
P | WT06 | Person weight, 2006 methodology |
P | AGE | Age |
P | SEX | Sex |
P | RACE | Race |
P | HISPAN | Hispanic origin |
P | MARST | Marital status |
P | EDUC | Highest level of school completed |
P | FULLPART | Full time/part time employment status |
P | SPOUSEPRES | Spouse or unmarried partner in household |
P | SPSEX | Sex of respondent’s spouse or unmarried partner |
P | HH_NUMOWNKIDS | Number of own children under 18 in household |
P | KIDUND13 | Own child under 13 in household |
A | ACTLINE | Activity line number |
A | ACTIVITY | Activity |
A | DURATION | Duration of activity |
Extract Notes:
- DATA FORMAT: .dat (fixed-width text)
- STRUCTURE: Hierarchical
- SAMPLE MEMBERS: Respondents
Data Download and Import
IPUMS NOTE: To load data, you must download both the extract’s data and the DDI.
Download the .DAT file after ATUSX processes your request (you will get an email).
To download the DDI file, right click on the DDI link under the CODEBOOK heading, and select “save link as”
Detailed instructions for importing the data can be found at: https://cran.r-project.org/web/packages/ipumsr/vignettes/ipums.html.
This R code was executed in the open source edition of RStudio.
User specific customization
Specify the file path to the local folder where you saved the ATUS data extracts.
Notice the “/” lean to the right and not the left, if you are using a Windows computer.
Users should also change the "atus_00049.xml"
to the file name of their own ATUSX extract.
dataDir <- file.path("C:/Users/Joanna/Dropbox/Data/ATUS")
rawdata <- "atus_00049.xml"
Load the libraries
library(ipumsr)
library(tidyverse)
library(labelled) # convert labeled data to factors
library(ggeffects) # Linear models and marginals
Libraries can be installed with the commands install.packages("ipumsr")
and install.packages("tidyverse")
. These only need to be installed once but users must load the libraries each time Rstudio is started.
Load ATUS data into R
Users should change the "atus_00049.xml"
to the file name of their own ATUSX extract.
ddi <- read_ipums_ddi(file.path(dataDir, rawdata))
data <- read_ipums_micro(ddi)
## Use of data from IPUMS ATUS is subject to conditions including that users
## should cite the data appropriately. Use command `ipums_conditions()` for more
## details.
Set-up the Data
Overview of the data
# Make sure data is now a dataframe.
class(data)
#Look at the structure of the data
summary(data)
# Check-out the variable names
names(data)
# Make the variable names lowercase
data <- data %>% rename_all(tolower)
Change the variables’ class from labelled to integer, character, or factor
fcols <- c("hhtenure", "day", "sex", "race", "hispan", "marst", "spousepres", "spsex",
"fullpart", "kidund13", "region")
icols <- c("year", "hh_size", "hh_child", "ageychild", "hh_numownkids",
"age", "educ", "wt06")
ccols <- c("caseid", "activity")
data[fcols] <- lapply(data[fcols], to_factor)
data[icols] <- lapply(data[icols], as.integer)
data[ccols] <- lapply(data[ccols], as.character)
Prepare duration and activity variables
# Change NA to 0 for duration & activity minutes
data[["duration"]][is.na(data[["duration"]])] <- 0
summary(data$duration)
data[["activity"]][is.na(data[["activity"]])] <- "0"
summary(data$activity)
# Check that duration = 1440
data %>%
group_by(caseid) %>%
summarise(total= sum(duration))
## `summarise()` ungrouping output (override with `.groups` argument)
## # A tibble: 210,586 x 2
## caseid total
## <chr> <dbl>
## 1 20030100013280 1440
## 2 20030100013344 1440
## 3 20030100013352 1440
## 4 20030100013848 1440
## 5 20030100014165 1440
## 6 20030100014169 1440
## 7 20030100014209 1440
## 8 20030100014427 1440
## 9 20030100014550 1440
## 10 20030100014758 1440
## # ... with 210,576 more rows
Create activity variables by person
Create the activity variable components
This creates a logical object in RStudio for each activity category.
The activity codes can be found in ATUS’Activity Coding Lexicon.
NOTE: The figure published in PAA Affairs used a more conservative measure of housework activities. This code includes all housework activities.
# Childcare
ccare <- data$activity %in%
c(030100:030400, 080100:080200, 180381)
# Housework
hswrk <- data$activity %in%
c(020101:030000, 080200:080300, 080700:080800, 090100:100000, 160106, 070101, 180701, 180904, 180807, 180903, 080699, 160106)
# Leisure
leisure <- data$activity %in%
c(120100:130000, 130100:140000, 160101, 160102, 181200:181400)
# Sleep
sleep <- data$activity %in%
c(010100:020000)
#Leisure sub-types
socl <- data$activity %in%
c(120100:120200, 120200:120300, 120400:120500, 120501, 120502, 120504, 120599, 130200:130300, 130302, 130402)
actl <- data$activity %in%
c(120307, 120309:120313, 130100:130200, 130301, 130401, 130499)
pass <- data$activity %in%
c(120301:120306, 120308, 120399, 120503)
# Television
tele <- data$activity %in%
c(120303, 120304)
Create the activity identification variables in the dataset
Next, we use those logical objects to create the activity category variables in our dataset. First, we create a new variable called “actcat”, with missing values. This variable will identify the activity categories, using the logical objects we created.
data$actcat<-NA
data$actcat[ccare] <- "child care"
data$actcat[hswrk] <- "housework"
data$actcat[leisure] <- "leisure"
data$actcat[sleep] <- "sleep"
data$actcat <- as.character(data$actcat)
We’ll also create a leisure category variable and one for television. Sub-categories of activties need their own variables. Here, television is a subcategory of passive leisure, which is a subcategory of leisure. Thus, we have more refined identification separated into different variables.
data$leiscat<-NA
data$leiscat[socl] <- "social leisure"
data$leiscat[actl] <- "active leisure"
data$leiscat[pass] <- "passive leisure"
data$leiscat <- as.character(data$leiscat)
data$telecat<-NA
data$telecat[tele] <- "television"
data$telecat <- as.character(data$telecat)
caseid | activity | duration | actcat | leiscat | telecat |
---|---|---|---|---|---|
20030100013280 | 0 | 0 | NA | NA | NA |
20030100013280 | 0 | 0 | NA | NA | NA |
20030100013280 | 130124 | 60 | leisure | active leisure | NA |
20030100013280 | 0 | 0 | NA | NA | NA |
20030100013280 | 10201 | 30 | sleep | NA | NA |
20030100013280 | 0 | 0 | NA | NA | NA |
20030100013280 | 10101 | 600 | sleep | NA | NA |
20030100013280 | 0 | 0 | NA | NA | NA |
20030100013280 | 120303 | 150 | leisure | passive leisure | television |
20030100013280 | 0 | 0 | NA | NA | NA |
20030100013280 | 110101 | 5 | NA | NA | NA |
20030100013280 | 0 | 0 | NA | NA | NA |
20030100013280 | 120303 | 175 | leisure | passive leisure | television |
20030100013280 | 0 | 0 | NA | NA | NA |
20030100013280 | 10101 | 270 | sleep | NA | NA |
20030100013280 | 0 | 0 | NA | NA | NA |
20030100013280 | 10201 | 10 | sleep | NA | NA |
20030100013280 | 0 | 0 | NA | NA | NA |
20030100013280 | 130124 | 140 | leisure | active leisure | NA |
20030100013280 | 0 | 0 | NA | NA | NA |
Duration variables
Next we need to create summary variables with the total duration of each activity for each person
# Master activity variables
data <- data %>%
group_by(caseid) %>%
summarise (leisure = sum(duration[actcat == "leisure"], na.rm=TRUE),
sleep = sum(duration[actcat == "sleep"], na.rm=TRUE),
hswrk = sum(duration[actcat == "housework"], na.rm=TRUE),
ccare = sum(duration[actcat == "child care"], na.rm=TRUE)) %>%
inner_join(data, by='caseid')
## `summarise()` ungrouping output (override with `.groups` argument)
# Leisure activity variables
data <- data %>%
group_by(caseid) %>%
summarise (socl = sum(duration[leiscat == "social leisure"], na.rm=TRUE),
actl = sum(duration[leiscat == "active leisure"], na.rm=TRUE),
pass = sum(duration[leiscat == "passive leisure"], na.rm=TRUE)) %>%
inner_join(data, by='caseid')
## `summarise()` ungrouping output (override with `.groups` argument)
# Television variables
data <- data %>%
group_by(caseid) %>%
summarise (tele = sum(duration[telecat == "television"], na.rm=TRUE)) %>%
inner_join(data, by='caseid')
## `summarise()` ungrouping output (override with `.groups` argument)
caseid | leisure | sleep | hswrk | ccare | socl | actl | pass | tele |
---|---|---|---|---|---|---|---|---|
20030100013280 | 525 | 910 | 0 | 0 | 0 | 200 | 325 | 325 |
20030100013280 | 525 | 910 | 0 | 0 | 0 | 200 | 325 | 325 |
20030100013280 | 525 | 910 | 0 | 0 | 0 | 200 | 325 | 325 |
20030100013280 | 525 | 910 | 0 | 0 | 0 | 200 | 325 | 325 |
20030100013280 | 525 | 910 | 0 | 0 | 0 | 200 | 325 | 325 |
Create tidy data
Currently, the data is in long format, with each person represented in multiple rows in the dataset. We will aggregate the data so each person is represented once per row.
caseid | pernum | lineno | sex | actline | activity | sleep |
---|---|---|---|---|---|---|
20030100013280 | NA | NA | NA | NA | 0 | 910 |
20030100013280 | 1 | 1 | Male | NA | 0 | 910 |
20030100013280 | NA | NA | NA | 1 | 130124 | 910 |
20030100013280 | NA | NA | NA | NA | 0 | 910 |
20030100013280 | NA | NA | NA | 2 | 10201 | 910 |
20030100013280 | NA | NA | NA | NA | 0 | 910 |
20030100013280 | NA | NA | NA | 3 | 10101 | 910 |
20030100013280 | NA | NA | NA | NA | 0 | 910 |
20030100013280 | NA | NA | NA | 4 | 120303 | 910 |
20030100013280 | NA | NA | NA | NA | 0 | 910 |
Create a summary dataset for the variables that repeat each row, per person.
max <- data %>%
select(caseid, tele, socl, actl, pass, ccare, hswrk, leisure, sleep, year) %>%
group_by(caseid) %>%
summarise_all(list(~max(.)))
Notice that each person is now represented only once per row.
## # A tibble: 5 x 9
## caseid tele socl actl pass ccare hswrk leisure sleep
## <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 20030100013280 325 0 200 325 0 0 525 910
## 2 20030100013344 60 530 0 60 60 0 610 680
## 3 20030100013352 120 220 0 120 0 325 340 640
## 4 20030100013848 265 0 0 265 5 270 265 755
## 5 20030100014165 60 0 60 60 120 70 130 460
Create a summary dataset for the variables that appear in the rectype = 1 rows, by person.
rec1 <- data %>%
filter(rectype == 1) %>%
select(caseid, hh_size, hh_child, ageychild, hh_numadults, hhtenure, region)
## # A tibble: 5 x 7
## caseid hh_size hh_child ageychild hh_numadults hhtenure region
## <chr> <int> <int> <int> <int+lbl> <fct> <fct>
## 1 20030100~ 3 0 999 3 Owned or being bough~ West
## 2 20030100~ 4 1 0 2 Owned or being bough~ West
## 3 20030100~ 2 0 999 2 Owned or being bough~ West
## 4 20030100~ 4 1 9 2 Owned or being bough~ South
## 5 20030100~ 4 1 14 2 Owned or being bough~ South
Create a summary dataset for the variables that appear in the rectype = 2 rows, by person.
rec2 <- data %>%
filter(rectype == 2) %>%
select(caseid, age, sex, race, hispan, marst, educ, fullpart, spousepres, spsex,
hh_numownkids, kidund13, day, wt06)
## # A tibble: 5 x 14
## caseid age sex race hispan marst educ fullpart spousepres spsex
## <chr> <int> <fct> <fct> <fct> <fct> <int> <fct> <fct> <fct>
## 1 20030~ 60 Male Blac~ Not H~ Marr~ 41 Part ti~ Spouse pr~ Fema~
## 2 20030~ 41 Fema~ Whit~ Not H~ Marr~ 30 Part ti~ Spouse pr~ Male
## 3 20030~ 26 Fema~ Whit~ Not H~ Marr~ 31 Part ti~ Spouse pr~ Male
## 4 20030~ 36 Fema~ Blac~ Not H~ Marr~ 21 NIU (No~ Spouse pr~ Male
## 5 20030~ 51 Male Whit~ Not H~ Marr~ 42 Full ti~ Spouse pr~ Fema~
## # ... with 4 more variables: hh_numownkids <int>, kidund13 <fct>, day <fct>,
## # wt06 <int>
Put them all together.
atus <- reduce(list(max, rec1, rec2),
left_join, by = "caseid")
Take a look at the person-level file with activity duration summary variables
## # A tibble: 5 x 29
## caseid tele socl actl pass ccare hswrk leisure sleep year hh_size
## <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <int> <int>
## 1 20030~ 325 0 200 325 0 0 525 910 2003 3
## 2 20030~ 60 530 0 60 60 0 610 680 2003 4
## 3 20030~ 120 220 0 120 0 325 340 640 2003 2
## 4 20030~ 265 0 0 265 5 270 265 755 2003 4
## 5 20030~ 60 0 60 60 120 70 130 460 2003 4
## # ... with 18 more variables: hh_child <int>, ageychild <int>,
## # hh_numadults <int+lbl>, hhtenure <fct>, region <fct>, age <int>, sex <fct>,
## # race <fct>, hispan <fct>, marst <fct>, educ <int>, fullpart <fct>,
## # spousepres <fct>, spsex <fct>, hh_numownkids <int>, kidund13 <fct>,
## # day <fct>, wt06 <int>
Transform variables
# Gender
atus$sex <-atus$sex %>%
droplevels()
levels(atus$sex) <- c('Men', 'Women')
# Age
summary(atus$age)
# Marital status
# cohabitors in one group regardless of marital history
# use spousepres variable as indicator of marital status at time of ATUS diary
# adding the "\n" between Divorced and Separated puts "separated" on the next line
atus <- atus %>%
mutate(
mar = case_when(
spousepres == "Spouse present" ~ "Married",
spousepres == "Unmarried partner present" ~ "Cohabiting",
marst == "Never married" & spousepres == "No spouse or unmarried partner present" ~ "Single",
marst != "Widowed" & marst != "Never married" &
spousepres == "No spouse or unmarried partner present" ~ "Divorced\nSeparated",
TRUE ~ NA_character_
))
atus$mar <- factor(atus$mar, levels = c("Married", "Cohabiting", "Single", "Divorced\nSeparated"), ordered = FALSE)
# Spouse/partner sex
atus <- atus %>%
mutate(
spsex = case_when(
spsex == "Male" ~ "Male",
spsex == "Female" ~ "Female",
spsex == "NIU (Not in universe)" ~ "NIU",
TRUE ~ NA_character_
))
atus$spsex <- factor(atus$spsex, levels = c("Male", "Female", "NIU"))
# Extended Family Member
atus <- atus %>%
mutate(
exfam = case_when(
((spousepres == "Spouse present" | spousepres == "Unmarried partner present") & hh_numadults <= 2) |
((spousepres == "No spouse or unmarried partner present") & hh_numadults <=1) ~ "No extra adults",
((spousepres == "Spouse present" | spousepres == "Unmarried partner present") & hh_numadults >= 3) |
((spousepres == "No spouse or unmarried partner present") & hh_numadults >=2) ~ "Extra adults",
TRUE ~ NA_character_
))
atus$exfam <- factor(atus$exfam, levels = c("No extra adults", "Extra adults"), ordered = FALSE)
#Kid under 2
atus <- atus %>%
mutate(
kidu2 = case_when(
ageychild <=2 ~ "Child < 2",
TRUE ~ "No children < 2"
))
atus$kidu2 <- as_factor(atus$kidu2)
# Number of own HH kids
summary(atus$hh_numownkids)
# Employment
atus <- atus %>%
mutate(
employ = case_when(
fullpart == "Full time" ~ "Full-time",
fullpart == "Part time" ~ "Part-time",
fullpart == "NIU (Not in universe)" ~ "Not employed",
TRUE ~ NA_character_
))
atus$employ <- factor(atus$employ, levels = c("Full-time", "Part-time", "Not employed"))
# Education
atus <- atus %>%
mutate(
educ = case_when(
(educ >= 10 & educ <= 17) ~ "Less than high school",
(educ >= 20 & educ <= 21) ~ "High school",
(educ >= 30 & educ <= 31) ~ "Some college",
(educ >= 40 & educ <= 43) ~ "BA or higher",
TRUE ~ NA_character_
))
atus$educ <- factor(atus$educ, levels = c("Less than high school", "High school", "Some college", "BA or higher"))
# Race
atus <- atus %>%
mutate(
raceth = case_when(
race == "White only" & hispan == "Not Hispanic" ~ "White",
race == "Black only" & hispan == "Not Hispanic" ~ "Black",
race == "Asian only" & hispan == "Not Hispanic" ~ "Asian",
hispan != "Not Hispanic" ~ "Hispanic",
TRUE ~ "Other"
))
atus$raceth <- factor(atus$raceth, levels = c("White", "Black", "Hispanic", "Asian", "Other"))
# Weekend
atus <- atus %>%
mutate(
weekend = case_when(
day == "Sunday" | day == "Saturday" ~ "Weekend",
day == "Monday" | day == "Tuesday" | day == "Wednesday" |
day == "Thursday" | day == "Friday" ~ "Weekday",
TRUE ~ NA_character_
))
atus$weekend <- factor(atus$weekend, levels = c("Weekday", "Weekend"))
# Region
summary(atus$region)
# Home ownership
atus <- atus %>%
mutate(
ownrent = case_when(
hhtenure == "Owned or being bought by a household member" ~ "Own",
hhtenure == "Rented for cash" ~ "Rent",
hhtenure == "Occupied without payment of cash rent" ~ "Other",
hhtenure == "NIU (Not in universe)" ~ "Other",
TRUE ~ NA_character_
))
atus$ownrent <- factor(atus$ownrent, levels = c("Own", "Rent", "Other"))
Sample selection
Limit the dataset to White, Black, and Hispanic mothers of children with kids 13 or younger.
Let’s also restrict the dataset to mothers who are prime working age.
atus <- filter(atus, sex == "Women") # women
atus <- filter(atus, spsex == "Male" | spsex == "NIU") # Different sex couples or singles
atus <- filter(atus, hh_numownkids >=1) # mothers
atus <- filter(atus, ageychild <=13) #mothers of kids 13 or younger
atus <- filter(atus, age >= 18 & age <=54) #prime working age
atus <- filter(atus, raceth != "Asian" & raceth != "Other") # white, black, and Hispanic
Use listwise deletion to deal with missing data
# Keep only the variables of interest
atus <- select(atus, caseid, wt06, year, ccare, hswrk, leisure, sleep, socl, actl, pass, tele, age, sex,
mar, exfam, kidu2, hh_numownkids, employ, educ, raceth, weekend, region, ownrent)
# Listwise deletion
atus <- na.omit(atus)
Limit analysis to the last 5 years of survey data
# If you don't want to look at only recent survey years, add # below the next line of code.
# Don't forget to then update the caption of the figure with the correct survey years.
atus <- filter(atus, year >= 2015)
Here’s a look at the data
## tibble [6,655 x 23] (S3: tbl_df/tbl/data.frame)
## $ caseid : chr [1:6655] "20150101150147" "20150101150578" "20150101150641" "20150101150748" ...
## $ wt06 : int [1:6655] 8560512 2336830 10731959 8452077 1724919 7661737 7344661 2004625 2986700 4031170 ...
## $ year : int [1:6655] 2015 2015 2015 2015 2015 2015 2015 2015 2015 2015 ...
## $ ccare : num [1:6655] 60 135 95 423 0 159 885 0 335 221 ...
## $ hswrk : num [1:6655] 20 20 90 150 0 302 75 230 165 170 ...
## $ leisure : num [1:6655] 5 95 60 120 275 95 50 460 150 80 ...
## $ sleep : num [1:6655] 720 630 610 561 745 509 360 690 565 520 ...
## $ socl : num [1:6655] 5 0 0 0 0 75 0 260 0 30 ...
## $ actl : num [1:6655] 0 35 60 0 0 20 0 0 0 0 ...
## $ pass : num [1:6655] 0 45 0 120 275 0 50 160 150 20 ...
## $ tele : num [1:6655] 0 45 0 120 275 0 50 160 150 20 ...
## $ age : int [1:6655] 51 33 36 32 31 31 30 41 29 25 ...
## $ sex : Factor w/ 2 levels "Men","Women": 2 2 2 2 2 2 2 2 2 2 ...
## $ mar : Factor w/ 4 levels "Married","Cohabiting",..: 1 1 1 1 3 1 1 4 1 1 ...
## $ exfam : Factor w/ 2 levels "No extra adults",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ kidu2 : Factor w/ 2 levels "No children < 2",..: 1 1 2 2 1 2 2 1 2 2 ...
## $ hh_numownkids: int [1:6655] 1 3 2 4 1 2 3 1 2 2 ...
## $ employ : Factor w/ 3 levels "Full-time","Part-time",..: 1 3 1 3 3 3 3 1 1 2 ...
## $ educ : Factor w/ 4 levels "Less than high school",..: 2 4 2 3 4 4 2 2 4 2 ...
## $ raceth : Factor w/ 5 levels "White","Black",..: 1 1 1 1 2 1 1 1 1 1 ...
## $ weekend : Factor w/ 2 levels "Weekday","Weekend": 2 2 1 1 2 1 1 2 2 2 ...
## $ region : Factor w/ 4 levels "Northeast","Midwest",..: 3 2 3 2 3 2 2 3 2 1 ...
## ..- attr(*, "label")= chr "Region"
## $ ownrent : Factor w/ 3 levels "Own","Rent","Other": 1 1 2 1 2 1 1 2 1 1 ...
## - attr(*, "na.action")= 'omit' Named int [1:2594] 45 49 63 82 92 131 132 140 154 157 ...
## ..- attr(*, "names")= chr [1:2594] "45" "49" "63" "82" ...
Create Linear Models and Margins
Use the ggeffects
package to compute marginal effects from statistical models and save the results as tidy data frames.
For each of activity variables of interest, run a linear model (lm
) and then calculate the marginal effects (ggeffect
)
ggeffect
calculates the adjusted predictions at the means, using proportions to fix factors.
#Childcare
lm_care <- lm(ccare ~ mar + exfam + hh_numownkids + kidu2 + educ + employ + raceth + age + weekend + region + ownrent, data = atus, weight=wt06)
pcare <- ggeffect(lm_care, terms = "mar")
#Housework
lm_hswrk <- lm(hswrk ~ mar + exfam + hh_numownkids + kidu2 + educ + employ + raceth + age + weekend + region + ownrent, data = atus, weight=wt06)
phswrk <- ggeffect(lm_hswrk, terms = "mar")
#Leisure
lm_leis <- lm(leisure ~ mar + exfam + hh_numownkids + kidu2 + educ + employ + raceth + age + weekend + region + ownrent, data = atus, weight=wt06)
pleis <- ggeffect(lm_leis, terms = "mar")
#Sleep
lm_sleep <- lm(sleep ~ mar + exfam + hh_numownkids + kidu2 + educ + employ + raceth + age + weekend + region + ownrent, data = atus, weight=wt06)
psleep <- ggeffect(lm_sleep, terms = "mar")
Combine predictions into one dataframe
For each of the marginal effects datasets, label the group values as the activity
levels(pcare$group)[levels(pcare$group)=="1"] <- "Childcare"
levels(phswrk$group)[levels(phswrk$group)=="1"] <- "Housework"
levels(pleis$group)[levels(pleis$group)=="1"] <- "Leisure"
levels(psleep$group)[levels(psleep$group)=="1"] <- "Sleep"
Maniuplate the marital status variable
# Change the variable to be a factor variable from numeric
pcare$x <- as.factor(pcare$x)
phswrk$x <- as.factor(phswrk$x)
pleis$x <- as.factor(pleis$x)
psleep$x <- as.factor(psleep$x)
Combine the data tables.
pred <- rbind(pcare, phswrk, pleis, psleep)
Prepare the categorical variables for data visualization.
# Revalue the marital status factors to be readable
levels(pred$x)[levels(pred$x)=="3"] <- "Married"
levels(pred$x)[levels(pred$x)=="1"] <- "Cohabiting"
levels(pred$x)[levels(pred$x)=="4"] <- "Single"
levels(pred$x)[levels(pred$x)=="2"] <- "Divorced\nSeparated"
# Order the marital status and activity factors
pred$x <- ordered(pred$x, levels = c("Married", "Cohabiting", "Single", "Divorced\nSeparated"))
pred$group <- ordered(pred$group, levels = c("Childcare", "Housework", "Leisure", "Sleep"))
Calculate the difference variable (mothers time use compared to married mothers time use, by activity).
pred <- pred %>%
group_by(group) %>%
mutate(diff = predicted - predicted[x == "Married"])
Data visualization
pred %>%
filter(x != "Married") %>%
ggplot(aes(x, diff, fill = x, label = round(diff, 0))) +
geom_col() +
facet_grid(~group) +
ggtitle("Married Mothers Report More Housework and Less Leisure and Sleep Than Other Mothers") +
labs(x = NULL, y = NULL, subtitle = "Predicted minutes per day with model controls",
caption = "Source: American Time Use Surveys (2015 - 2019) \n Models control for extra adults, number of household kids, kids under 2,
education, employment, race-ethnicity, age, weekend diary day, home ownership and region") +
theme_minimal() +
theme(plot.subtitle = element_text(vjust = 1),
plot.caption = element_text(vjust = 1),
legend.position="none",
strip.text.x = element_text(face = "bold"),
plot.title = element_text(face = "bold"),
panel.grid.minor = element_blank(),
panel.grid.major.x = element_blank()) +
scale_y_continuous(breaks = c(-30, -20, -10, 0, 10, 20, 30), labels=c("-30", "-20", "-10", "Married mothers' \n minutes per day", "10", "20", "30")) +
scale_fill_manual(values=c("#18BC9C", "#F39C12", "#E74C3C")) +
geom_errorbar(aes(ymin=diff-std.error, ymax=diff+std.error), width=.2,
position=position_dodge(.9), color="grey")