Turn the dark theme if you are reading it at night! (button with sun at the left)
library(foreign); library(knitr); library(ggplot2); library(kableExtra)
library(jtools); library(memisc); library(sjPlot); library(ggstance)
library(broom.mixed); library(stargazer); library(ggthemr)
df <- read.spss("dataht5.sav", use.value.labels = T, to.data.frame = T)
df <- na.omit(df[c("happy", "gndr", "inwyys", "yrbrn", "sclmeet")])
1. Visualizing distributions
Even nicer amount of themes for plotting!
I just can’t stop choosing between different themes, help
Happy (dependent variable)
g <- ggplot(df, aes(x = happy))
g + geom_bar(stat = 'count') +
labs(x = "Level of happiness", y = "Count")
Level of happiness looks like skewed to the right distribution, but can be considered as quite close to normal distribution.
Gender
g <- ggplot(df, aes(x = gndr))
g + geom_bar(stat = 'count') +
labs(x = "Gender", y = "Count")
There are more female respondents than male ones - nearly on one fourth larger sample.
Age
df$age <- as.numeric(levels(df$inwyys))[df$inwyys] - as.numeric(levels(df$yrbrn))[df$yrbrn]
ggplot(df, aes(x = age)) +
geom_density(fill = "salmon", stat = 'density') +
stat_function(fun = dnorm, args = list(mean = mean(df$age), sd = sd(df$age))) +
labs(x = "Age", y = "Density")
Distrubution of age have two peakes at ~30 and at ~60 but still can be considered as close to equal (if we really want to).
Meeting with friends and colleagues frequency
ggplot(df, aes(x = sclmeet)) +
geom_bar(stat = 'count') +
labs(x = "Frequency of meeting", y = "Count")
Too can be considered as normal distribution (if look from far distance).
2. Simple model with control variables (age & gender)
Great visualization tool for regression!
model1 <- lm(data = df, as.numeric(happy) ~ age + gndr)
tab_model(model1)
 | as.numeric(happy) | ||
---|---|---|---|
Predictors | Estimates | CI | p |
(Intercept) | 8.25 | 8.00 – 8.49 | <0.001 |
age | -0.02 | -0.02 – -0.01 | <0.001 |
gndr [Female] | -0.13 | -0.32 – 0.05 | 0.145 |
Observations | 2346 | ||
R2 / R2 adjusted | 0.019 / 0.018 |
And gender is not significant! It can be seen not only by p-value larger than 0.05, but also that confident interval includes 0.
Even though age is significant it still causes very small effect at first glance - but if we consider that age don’t start from 0 and difference can be 50 or more years difference in effect can be really large (per -0.02 for each year).
Looks like the older we get the less happier we feel :(
3. More complex model with meeting frequency
model2 <- lm(data = df, as.numeric(happy) ~ age + gndr + sclmeet)
# https://strengejacke.github.io/sjPlot/articles/tab_model_estimates.html
tab_model(model2)
 | as.numeric(happy) | ||
---|---|---|---|
Predictors | Estimates | CI | p |
(Intercept) | 7.15 | 6.67 – 7.64 | <0.001 |
age | -0.01 | -0.02 – -0.01 | <0.001 |
gndr [Female] | -0.14 | -0.32 – 0.04 | 0.122 |
sclmeet [Less than once a month] |
0.59 | 0.15 – 1.02 | 0.008 |
sclmeet [Once a month] | 0.80 | 0.34 – 1.25 | 0.001 |
sclmeet [Several times a month] |
0.94 | 0.51 – 1.37 | <0.001 |
sclmeet [Once a week] | 1.19 | 0.74 – 1.64 | <0.001 |
sclmeet [Several times a week] |
1.20 | 0.74 – 1.66 | <0.001 |
sclmeet [Every day] | 1.20 | 0.70 – 1.70 | <0.001 |
Observations | 2346 | ||
R2 / R2 adjusted | 0.038 / 0.035 |
The second model is more complex and include frequency of meeting with friends and colleagues: - gender again is not significant! - frequency of meeting is significant and have strong positive effect on level of happiness!
stargazer(anova(model1, model2), type = "html", title="ANoVA", intercept.bottom = FALSE,
single.row = TRUE, align=TRUE, ci=TRUE, ci.level=0.95, no.space=TRUE)
Statistic | N | Mean | St. Dev. | Min | Pctl(25) | Pctl(75) | Max |
Res.Df | 2 | 2,340.000 | 4.243 | 2,337 | 2,338.5 | 2,341.5 | 2,343 |
RSS | 2 | 10,810.390 | 147.594 | 10,706.020 | 10,758.210 | 10,862.570 | 10,914.750 |
Df | 1 | 6.000 | 6.000 | 6.000 | 6.000 | 6.000 | |
Sum of Sq | 1 | 208.730 | 208.730 | 208.730 | 208.730 | 208.730 | |
F | 1 | 7.594 | 7.594 | 7.594 | 7.594 | 7.594 | |
Pr(> F) | 1 | 0.00000 | 0.00000 | 0.00000 | 0.00000 | 0.00000 | |
The model2 with frequency of meeting is significantly better than model1: lower RSS shows that.
tab_model(model1, model2)
 | as.numeric(happy) | as.numeric(happy) | ||||
---|---|---|---|---|---|---|
Predictors | Estimates | CI | p | Estimates | CI | p |
(Intercept) | 8.25 | 8.00 – 8.49 | <0.001 | 7.15 | 6.67 – 7.64 | <0.001 |
age | -0.02 | -0.02 – -0.01 | <0.001 | -0.01 | -0.02 – -0.01 | <0.001 |
gndr [Female] | -0.13 | -0.32 – 0.05 | 0.145 | -0.14 | -0.32 – 0.04 | 0.122 |
sclmeet [Less than once a month] |
0.59 | 0.15 – 1.02 | 0.008 | |||
sclmeet [Once a month] | 0.80 | 0.34 – 1.25 | 0.001 | |||
sclmeet [Several times a month] |
0.94 | 0.51 – 1.37 | <0.001 | |||
sclmeet [Once a week] | 1.19 | 0.74 – 1.64 | <0.001 | |||
sclmeet [Several times a week] |
1.20 | 0.74 – 1.66 | <0.001 | |||
sclmeet [Every day] | 1.20 | 0.70 – 1.70 | <0.001 | |||
Observations | 2346 | 2346 | ||||
R2 / R2 adjusted | 0.019 / 0.018 | 0.038 / 0.035 |
Still, each model describes only up to 0.04 variance max at the best case and that’s too bad!
4. Interpretation
Assumptions
plot_model(model2, type = "diag")[[1]]
We don’t have any multicollinearity!
plot_model(model2, type = "diag")[[2]]
And even residuals distributed normally and there are no major outliers!
plot_model(model2, type = "diag")[[3]]
Here again - residuals at the best shape and very nicely fits under the normal distribution line.
plot_model(model2, type = "diag")[[4]]
And here the baddest picture - homoscedasticity. Looks as bad as \[R^2\] of our models.
5. Visualization of the better model
shine bright!
plot_summs(model2, plot.distributions = TRUE, inner_ci_level = .9)
Again, here can be seen coefficients with confidence intervals of the best (second) model: - age is nearly on the line - gender crosses the line with it’s confidence interval - and for different levels of meeting frequency everything great!
Effects
plot_model(model2, type = "pred")[[1]]
Age has significant negative effect and with greater age level of happiness expected to lower.
plot_model(model2, type = "pred")[[2]]
Gender is not significant so that picture can be scipped.
plot_model(model2, type = "pred")[[3]]
And level of happiness significantly affected by meeting frequency with positive direction!
Summary
It does not matter what gender are you - just spend time with your friends more while you are young and be happy!