I am new to R.
I want to plot 4 box plots for 4 continuous variables and present them in the same plot. I am trying to present the boxplot for each variable in 2 study groups while using facet_wrap in ggplot.
dividing variable is: cognitive_groups (has two values 0, 1)
the 4 variables are: memory (presented here), attention, exeuctive and language domains.
here is the code,
cogdb_bl%>%
filter(!is.na(cognitive_groups))%>%
ggplot(aes(x=memory))+
geom_boxplot(aes(y=""))+
facet_wrap(~cognitive_groups)+
theme_bw()+
coord_flip()+
labs(title="Cognitive domains in baseline groups",
x="Z score")
Here is the output,
How do I present the other variables alongside the memory?
THANKS!
Do you mean like this? A tribble by the way is a nice way to create a minimal sample of data.
library(tidyverse)
tribble(
~participant, ~memory, ~attention, ~language, ~executive, ~cognitive,
"A", 2, 5, 2, 2, 0,
"B", 2, 2, 5, 2, 1,
"C", 2, 2, 2, 2, 0,
"D", 2, 3, 2, 6, 1,
"E", 2, 2, 2, 2, 0,
"F", 2, 2, 8, 2, 0,
"G", 2, 4, 2, 2, 1,
"H", 2, 2, 7, 2, 1
) |>
pivot_longer(c(memory, attention, language, executive),
names_to = "domain", values_to = "score") |>
ggplot(aes(domain, score)) +
geom_boxplot() +
facet_wrap(~cognitive) +
theme_bw() +
coord_flip() +
labs(
title = "Cognitive domains in baseline groups",
y = "Z score"
)
Created on 2022-04-20 by the reprex package (v2.0.1)
Related
I want to calculate the sum of y along the x-axis. The range for summation is contained in the separate columns xmin and xmax.
df <- data.frame (group = c("A","A","A","A","A","B","B","B","B","B" ),
x = c(1,2,3,4,5,1,2,3,4,5),
y= c(1,2,3,2,1,4,5,6,5,4),
xmin=c(2,2,2,2,2,1,1,1,1,1),
xmax=c(4,4,4,4,4,5,5,5,5,5))
For group A that is a range x from 2 to 4, sum{2+3+2}=7
For group B, range x from 1 to 5 sum{4+5+6+5+4}=24
Is there a way to do it?
I have tried around a bit but I'm not sure if the following goes in the right direction
df %>% rowwise() %>% mutate(sumX=sum(df$y[df$x>=df$min & df$x<=df$max]))
Using between to subset, then just sum in tapply.
subset(df, do.call(data.table::between, c(list(x), list(xmin, xmax)))) |>
with(tapply(y, group, sum))
# A B
# 7 24
Note: R >= 4.1 used.
Data:
df <- structure(list(group = c("A", "A", "A", "A", "A", "B", "B", "B",
"B", "B"), x = c(1, 2, 3, 4, 5, 1, 2, 3, 4, 5), y = c(1, 2, 3,
2, 1, 4, 5, 6, 5, 4), xmin = c(2, 2, 2, 2, 2, 1, 1, 1, 1, 1),
xmax = c(4, 4, 4, 4, 4, 5, 5, 5, 5, 5)), class = "data.frame", row.names = c(NA,
-10L))
I am conducting analyses of survey data nested by country and year. The respodents surveyed are never the same, but the countries surveyed are repeated.
The data looks something like this, where y is the DV, x is the IV, g is a group variable that I am interested in interacting with the DV x. The data is nested by country co and by year t.
dat <- data.frame(y = c(1, 2, 3, 4, 5, 4, 3, 2, 1, 2, 5, 2, 4, 3, 1),
x = c(1, 6, 3, 9, 3, 6, 4, 4, 9, 2, 8, 2, 5, 3, 7),
g = c(1, 1, 2, 2, 1, 1, 2, 2, 1, 1, 2, 2, 1, 1, 2),
t = c(0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0),
co = c("A", "A", "A", "A", "A", "A", "A", "B", "B", "B", "B", "B", "B", "B", "B"))
Basically, I want to conduct a longitudinal analysis where x*g predicts y. Given it's a longitudinal analysis, I think I need to interact the effect with year t, correct? Also, I think I need to control for the random effect and slopes of country co. So this is what I've done:
model1 <- glmer(y ~ x*t*g + (1+x|co) + (1|co), data = dat)
stargazer(model1, type = "text")
===============================================
Dependent variable:
---------------------------
y
-----------------------------------------------
x -0.390
(0.818)
t -3.317
(6.415)
g -1.971
(3.012)
x:t 0.471
(1.218)
x:g 0.379
(0.514)
t:g 1.566
(4.312)
x:t:g -0.359
(0.798)
Constant 5.517
(4.367)
-----------------------------------------------
Observations 15
Log Likelihood -22.627
Akaike Inf. Crit. 71.253
Bayesian Inf. Crit. 80.458
===============================================
Note: *p<0.1; **p<0.05; ***p<0.01
I am not sure if this is the correct way to conduct longitudinal analysis like this, so wanted to ask if someone could confirm or correct this. Thanks.
I have a tibble created like this:
tibble(district = c(1, 5, 3, 5, 2, 7, 8, 1, 1, 2, 2, 4, 5, 6, 8, 6, 3),
housing = c(1, 1, 2, 1, 2, 2, 2, 1, 1, 2, 3, 2, 1, 1, 1, 3, 2))
Now I would like to know how the type of housing is distributed per district. Since the amount of respondents per district is different, I would like to work with percentages. Basically I'm looking for two plots;
1) One barplot in which the percentage of housing categories is visualized in 1 bar per district (since it is percentages all the bars would be of equal height).
2) A pie chart for every district, with the percentage of housing categories for that specific district.
I am however unable to group the data is the wished way, let along compute percentages of them. How to make those plots?
Thanks ahead!
Give this a shot:
library(tidyverse)
library(ggplot2)
# original data
df <- data.frame(district = c(1, 5, 3, 5, 2, 7, 8, 1, 1, 2, 2, 4, 5, 6, 8, 6, 3),
housing = c(1, 1, 2, 1, 2, 2, 2, 1, 1, 2, 3, 2, 1, 1, 1, 3, 2))
# group by district
df <- df %>%
group_by(district) %>%
summarise(housing=sum(housing))
# make percentages
df <- df %>%
mutate(housing_percentage=housing/sum(df$housing)) %>%
mutate(district=as.character(district)) %>%
mutate(housing_percentage=round(housing_percentage,2))
# bar graph
ggplot(data=df) +
geom_col(aes(x=district, y=housing_percentage))
# pie chart
ggplot(data=df, aes(x='',y=housing_percentage, fill=district)) +
geom_bar(width = 1, stat = "identity", color = "white") +
coord_polar("y", start = 0) +
theme_void()
Which yields the following plots:
I want to have a barplot using ggplot2 that display multiple bars within each group, but in my plot, I have 4 bars instead of 8 for each group. I will appreciate your help.
here is my code:
levels = c('D', 'S1', 'S2', 'S3', 'S4', 'S5', 'S6', 'S7', 'S8', 'S9')
method = c('G1', 'G2', 'G3', 'G4', 'G5', 'G6', 'G7','G8')
ave = c(4, 4, 4, 4, 5, 1, 2, 6, 3, 5, 2, 2, 2, 2, 5, 3, 4, 1, 1, 1, 2,
2, 2, 2, 3, 3, 2, 1, 1, 1, 1, 3, 4, 5, 6, 8, 9, 7, 1, 2, 3, 3, 4, 5, 7,
6, 1, 1, 1, 2, 5, 7, 7, 8, 9, 1, 4, 6, 2, 3, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1)
levels = factor(c(rep(levels,8)))
method = factor(c(rep(method,10)))
dat = data.frame(levels,ave,method)
dodge = position_dodge(width = .9)
p = ggplot(dat,mapping =aes(x = as.factor(levels),y = ave,fill =
as.factor(method)))
p + geom_bar(stat = "identity",position = "dodge") +
xlab("levels") + ylab("Mean")
It looks like geom_bar will only plot bars for observations that exist; if you want to have bars for every method (assuming you want each level to have a bar for each method), you need to have observations in your data corresponding to those pairings. Currently, it looks like each level corresponds to two methods at most. To artificially generate those pairings, you can use tidyr::complete() and tidyr::expand() before plotting. For each new pairing, ave will automatically be assigned NA, but you can change this behavior using the fill parameter in tidyr::complete().
Here's an example where ave is set to 0 for every new pairing instead of NA:
dat %>%
complete(expand(dat, levels, method), fill = list(ave = 0)) %>%
ggplot(df4,mapping = aes(x = as.factor(levels),
y = ave,
fill = as.factor(method),
)) +
geom_bar(stat = "identity", position = position_dodge(width = 1))+
xlab("levels") +
ylab("Mean")
There are quite some answers to this question. Not only on stack overflow but through internet. However, none could solve my problem. I have two problems
I try to simulate a data for you
df <- structure(list(Group = c(1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2,
2, 2, 2), var1 = c(2, 3, 1, 2, 3, 2, 3, 3, 5, 6, 7, 6, 8, 5,
5), var2 = c(9, 9, 9, 8, 7, 8, 9, 3, 2, 2, 1, 1, 2, 3, 3), var3 = c(6,
7, 6, 6, 5, 6, 7, 1, 2, 1, 2, 3, 1, 1, 2)), .Names = c("Group",
"var1", "var2", "var3"), row.names = c(NA, -15L), class = "data.frame")
then I do as follows:
fit <- lda(Group~., data=df)
plot(fit)
I end up with groups appearing in two different plots.
how to plot my results in one figure like e.g. Linear discriminant analysis plot
Linear discriminant analysis plot using ggplot2
or any other beautiful plot ?
The plot() function actually calls plot.lda(), the source code of which you can check by running getAnywhere("plot.lda"). This plot() function does quiet a lot of processing of the LDA object that you pass in before plotting. As a result, if you want to customize how your plots look, you will probably have to write your own function that extracts information from the lda object and then passes it to a plot fuction. Here is an example (I don't know much about LDA, so I just trimmed the source code of the default plot.lda and use ggplot2 package (very flexible) to create a bunch of plots).
#If you don't have ggplot2 package, here is the code to install it and load it
install.packages("ggplot2")
library("ggplot2")
library("MASS")
#this is your code. The only thing I've changed here is the Group labels because you want a character vector instead of numeric labels
df <- structure(list(Group = c("a", "a", "a", "a", "a", "a", "a", "b", "b", "b", "b", "b", "b", "b", "b"),
var1 = c(2, 3, 1, 2, 3, 2, 3, 3, 5, 6, 7, 6, 8, 5, 5),
var2 = c(9, 9, 9, 8, 7, 8, 9, 3, 2, 2, 1, 1, 2, 3, 3),
var3 = c(6, 7, 6, 6, 5, 6, 7, 1, 2, 1, 2, 3, 1, 1, 2)),
.Names = c("Group","var1", "var2", "var3"),
row.names = c(NA, -15L), class = "data.frame")
fit <- lda(Group~., data=df)
#here is the custom function I made that extracts the proper information from the LDA object. You might want to write your own version of this to make sure it works with all cases (all I did here was trim the original plot.lda() function, but I might've deleted some code that might be relevant for other examples)
ggplotLDAPrep <- function(x){
if (!is.null(Terms <- x$terms)) {
data <- model.frame(x)
X <- model.matrix(delete.response(Terms), data)
g <- model.response(data)
xint <- match("(Intercept)", colnames(X), nomatch = 0L)
if (xint > 0L)
X <- X[, -xint, drop = FALSE]
}
means <- colMeans(x$means)
X <- scale(X, center = means, scale = FALSE) %*% x$scaling
rtrn <- as.data.frame(cbind(X,labels=as.character(g)))
rtrn <- data.frame(X,labels=as.character(g))
return(rtrn)
}
fitGraph <- ggplotLDAPrep(fit)
#Here are some examples of using ggplot to display your results. If you like what you see, I suggest to learn more about ggplot2 and then you can easily customize your plots
#this is similar to the result you get when you ran plot(fit)
ggplot(fitGraph, aes(LD1))+geom_histogram()+facet_wrap(~labels, ncol=1)
#Same as previous, but all the groups are on the same graph
ggplot(fitGraph, aes(LD1,fill=labels))+geom_histogram()
The following example won't work with your example because you don't have LD2, but this is equivalent to the scatter plot in the external example you provided. I've loaded that example here as a demo
ldaobject <- lda(Species~., data=iris)
fitGraph <- ggplotLDAPrep(ldaobject)
ggplot(fitGraph, aes(LD1,LD2, color=labels))+geom_point()
I didn't customize ggplot settings much, but you can make your graphs look like anything you want if you play around with it.Hope this helps!