How can I use the ggplot function to visualise grouped data? - r

I have a data set which has the time taken for individuals to read a sentence (response_time) under the experimental factors of the condition of the sentence (normal or visually degraded) and the number of cups of coffee (caffeine) that an individual has drunk. I want to visualise the data using ggplot, but with the data grouped according to the condition of the sentence and the coffee drunk - e.g. the response times recorded for individuals reading a normal sentence and having drunk one cup of coffee.
This is what I have tried so far, but the graph comes up as one big blob (not separated by group) and has over 15 warnings!!
participant condition response_time caffeine
<dbl> <fct> <dbl> <fct>
1 1 Normal 984 1
2 2 Normal 1005 1
3 3 Normal 979 3
4 4 Normal 1040 2
5 5 Normal 1008 2
6 6 Normal 979 3
>
tidied_data_2 %>%
ggplot(aes(x = condition:caffeine, y = response_time, colour = condition:caffeine)) +
geom_violin() +
geom_jitter(width = .1, alpha = .25) +
guides(colour = FALSE) +
stat_summary(fun.data = "mean_cl_boot", colour = "black") +
theme_minimal() +
theme(text = element_text(size = 13)) +
labs(x = "Condition X Caffeine", y = "Response Time (ms)")
Any suggestions on how to better code what I want would be great.

As a wiki answer because too long for a comment.
Not sure what you are intending with condition:caffeine - I've never seen that syntax in ggplot. Try aes(x = as.character(caffeine), y = ..., color = as.character(caffeine)) instead (or, because it is a factor in your case anyways, you can just use aes(x = caffeine, y = ..., color = caffeine)
If your idea is to separate by condition, you could just use aes(x = caffeine, y = ..., color = condition), as they are going to be separated by x anyways.
of another note - why not actually plotting a scatter plot? Like making this a proper two-dimensional graph. suggestion below.
library(ggplot2)
library(dplyr)
tidied_data_2 <- read.table(text = "participant condition response_time caffeine
1 1 Normal 984 1
2 2 Normal 1005 1
3 3 Normal 979 3
4 4 Normal 1040 2
5 5 Normal 1008 2
6 6 Normal 979 3", head = TRUE)
tidied_data_2 %>%
ggplot(aes(x = as.character(caffeine), y = response_time, colour = as.character(caffeine))) +
## geom_violin does not make sense with so few observations
# geom_violin() +
## I've removed alpha so you can see the dots better
geom_jitter(width = .1) +
guides(colour = FALSE) +
stat_summary(fun.data = "mean_cl_boot", colour = "black") +
theme_minimal() +
theme(text = element_text(size = 13)) +
labs(x = "Condition X Caffeine", y = "Response Time (ms)")
what I would rather do
tidied_data_2 %>%
## in this example as.integer(as.character(x)) is unnecessary, but it is necessary for your data sample
ggplot(aes(x = as.integer(as.character(caffeine)), y = response_time)) +
geom_jitter(width = .1) +
theme_minimal()

Related

How to combine vlines from one dataframe with series from another dataframe using GGPLOT2 in R

I am trying to make a graph that will plot the cumulative sum value of different customers which will reset whenever a new order is placed. When a new order is placed, it will be indicated with a DateTick = 1 and I've tried to add this to my plots with vlines. Unfortunately, the plot will only show me either the correct Vlines or the correct series lines.
The data I'm using looks something like this
> head(CUSTWP)
# A tibble: 6 x 6
# Groups: Customer [1]
Customer YearWeek `Corrected Delta` `Ordered Quantity TU` DateTick ROP
<chr> <chr> <dbl> <dbl> <dbl> <dbl>
1 CustLoc1 2020-01 46 NA 0 46
2 CustLoc1 2020-02 148 NA 0 194
3 CustLoc1 2020-03 150 NA 0 344
4 CustLoc1 2020-04 186 NA 0 530
5 CustLoc1 2020-05 205 NA 0 735
6 CustLoc1 2020-06 246 NA 0 981
I used below mentioned code to create the graphs.
p <- CUSTWP[CUSTWP$DateTick==1,]
p <- p[,1:2]
vline.dat <- data.frame(z=p$Customer, vl=p$YearWeek)
ggplot(CUSTWP, aes(YearWeek,`ROP`, group=1)) + geom_line(color= 'red', size = 0.8) + geom_vline(aes(xintercept=vl), data=vline.dat, linetype=4) +
facet_grid(Customer ~ ., scales = "free_y") + theme_light() + ggtitle('Reordering Points') +
theme(axis.text.x = element_text(angle = 20, vjust = 1, hjust=0.9), text = element_text(size = 14)) +
scale_x_discrete(guide = guide_axis(check.overlap = TRUE))
When I execute the code, I get a result as can be seen in the link.
The issue with this graph is that the Vlines are the orders DateTicks for all customers rather than the DateTicks grouped by customer. I've tried a different code that somehow produces the correct graphs but also a bunch of incorrect graphs with below-mentioned code.
p <- CUSTWP[CUSTWP$DateTick==1,]
p <- p[,1:2]
vline.dat <- data.frame(z=p$Customer, vl=p$YearWeek)
ggplot(CUSTWP, aes(YearWeek,`ROP`, group=1)) + geom_line(color= 'red', size = 0.8) + geom_vline(aes(xintercept=vl), data=vline.dat, linetype=4) +
facet_grid(Customer ~ z, scales = "free_y") + theme_light() + ggtitle('Reordering Points') +
theme(axis.text.x = element_text(angle = 20, vjust = 1, hjust=0.9), text = element_text(size = 14)) +
scale_x_discrete(guide = guide_axis(check.overlap = TRUE))
The above code creates a matrix of plots but the only correct ones are the plots on the diagonal line running from top left to bottom right.
I would really appreciate your input on this as I've been stuck on this for quite some time. Thank you in advance and apologies for the incorrect posting standards, this is my first post.

ggplot and lapply /mapply for nested list and data frames

Edit:
I did find a way to do what I need, but now I'm having trouble getting a title to appear for each of the plots that are created so I know which site I am looking at:
lapply(seq(gl), function(i){
lapply(seq(gl[[i]]), function(j){
ggplot() +
geom_point(data = gl[[i]][[j]], aes(x = `UTC_date.1`, y = `actSWE_mm`, color = `swe_Res_mm`))+
geom_segment(data = gl[[i]][[j]], aes(x = `UTC_date.1`, y = `actSWE_mm`, xend = `UTC_date.1`, yend = `swe_mm`), alpha=.2)+
scale_color_steps2(low = "blue", mid = "white", high = "red") +
guides(color = FALSE) + geom_point(data = gl[[i]][[j]], aes(x = `UTC_date.1`, y = `swe_mm`), shape = 1) +
facet_wrap(vars(year), scales="free_x") + theme_bw()
})})
I tried adding:
theme(plot.title = paste(names(gl)[i], names(gl[[i]])[j], sep = "_"))
but that does not seem to work.
Original:
I have a list of 12 dataframes representing each month. Within each data frame are timeseries measurements of several different sites. Below is a table example (not actual data) of the data for January (monthSplit is the list - monthSplit$January):
site_id UTC_date.1 swe_mm actSWE_mm swe_Res_mm Month Year
<int> <date> <dbl> <dbl> <dbl> <chr> <num>
1003 2005-01-01 2 54.2 0.241 53.059 "January" 2005
1003 2005-01-02 2 54.2 0.241 53.059 "January" 2005
958 2005-01-01 2 154.2 0.241 153.059 "January" 2005
946 2005-01-01 2 154.2 152.25 1.95 "January" 2005
946 2005-01-02 2 500.2 550.241 50.059 "January" 2005
I'm having two problems when trying to perform ggplot over a list of dataframes that need to be further subset by the unique sites.
I tried to create a ggplot function and use mapply:
plot_fun = function(d) {
ggplot(d, aes(x = `UTC_date.1`, y = `actSWE_mm`)) +
geom_segment(aes(xend = `UTC_date.1`, yend = `swe_mm`), alpha=.2) + geom_point(aes(color = `swe_Res_mm`)) +
scale_color_steps2(low = "blue", mid = "white", high = "red") +
guides(color = FALSE) + geom_point(aes(y = `swe_mm`), shape = 1) +
facet_wrap(vars(year), scales="free_x") + theme_bw()
}
pltlist = mapply(plot_fun, d = monthSplit, SIMPLIFY=FALSE)
This yielded plot in the right format and everything, however it was not split by site_id. So it created a plot that contained several plots with the month's plot values each year. EG: September plot yielded 13 plots in one window representing each year from 2003-2015 for the month of September. The problem is, all the sites were lumped in there.
When looking at the actual data (as is the case with the above plot function), nothing meaningful is gained from the plots because the range of data varies so broadly in the y-axis.
I was wondering how I would go about splitting the list of plots further by site_id so that only one site appears in each plot for comparison.
Add group = site_id if you want to have one color point and line per site_id, e.g.
plot_fun = function(d) { ggplot(d, aes(x = UTC_date.1, y = actSWE_mm, group = site_id)) + geom_segment(aes(xend = UTC_date.1, yend = swe_mm), alpha=.2) + geom_point(aes(color = swe_Res_mm)) + scale_color_steps2(low = "blue", mid = "white", high = "red") + guides(color = FALSE) + geom_point(aes(y = swe_mm), shape = 1) + facet_wrap(vars(year), scales="free_x") + theme_bw() }
(Note I had to delete all your '`' characters as that is the code character).
Not this proposal gives not more plots, but more lines per plot.
If you want to have one plots per site_id, you might split your datasets by that variable, or include it in the facet_wrap:
facet_wrap(facets = ~ year + site_id, scales="free_x")
And if the scales are very different per site, I use log scales. However, zeros and negative values cannot be graphed then, that is a drawback.

How to create two barplots with different x and y axis in tha same plot in R?

I need plot two grouped barcodes with two dataframes that has distinct number of rows: 6, 5.
I tried many codes in R but I don't know how to fix it
Here are my data frames: The Freq colum must be in Y axis and the inter and intra columns must be the x axis.
> freqinter
inter Freq
1 0.293040975264367 17
2 0.296736775990729 2
3 0.297619926364764 4
4 0.587377012109561 1
5 0.595245125315916 4
6 0.597022018595893 2
> freqintra
intra Freq
1 0 3
2 0.293040975264367 15
3 0.597022018595893 4
4 0.598809552335782 2
5 0.898227748764939 6
I expect to plot the barplots in the same plot and could differ inter e intra values by colour
I want a picture like this one:
You probably want a histogram. Use the raw data if possible. For example:
library(tidyverse)
freqinter <- data.frame(x = c(
0.293040975264367,
0.296736775990729,
0.297619926364764,
0.587377012109561,
0.595245125315916,
0.597022018595893), Freq = c(17,2,4,1,4,2))
freqintra <- data.frame(x = c(
0 ,
0.293040975264367,
0.597022018595893,
0.598809552335782,
0.898227748764939), Freq = c(3,15,4,2,6))
df <- bind_rows(freqinter, freqintra, .id = "id") %>%
uncount(Freq)
ggplot(df, aes(x, fill = id)) +
geom_histogram(binwidth = 0.1, position = 'dodge', col = 1) +
scale_fill_grey() +
theme_minimal()
With the data you posted I don't think you can have this graph to look good. You can't have bars thin enough to differentiate 0.293 and 0.296 when your data ranges from 0 to 0.9.
Maybe you could try to treat it as a factor just to illustrate what you want to do:
freqinter <- data.frame(x = c(
0.293040975264367,
0.296736775990729,
0.297619926364764,
0.587377012109561,
0.595245125315916,
0.597022018595893), Freq = c(17,2,4,1,4,2))
freqintra <- data.frame(x = c(
0 ,
0.293040975264367,
0.597022018595893,
0.598809552335782,
0.898227748764939), Freq = c(3,15,4,2,6))
df <- bind_rows(freqinter, freqintra, .id = "id")
ggplot(df, aes(x = as.factor(x), y = Freq, fill = id)) +
geom_bar(stat = "identity", position = position_dodge2(preserve = "single")) +
theme(axis.text.x = element_text(angle = 90)) +
scale_fill_discrete(labels = c("inter", "intra"))
You can also check the problem by not treating your x variable as a factor:
ggplot(df, aes(x = x, y = Freq, fill = id)) +
geom_bar(stat = "identity", width = 0.05, position = "dodge") +
theme(axis.text.x = element_text(angle = 90)) +
scale_fill_discrete(labels = c("inter", "intra"))
Either the bars must be very thin (small width), or you'll get overlapping x intervals breaking the plot.

ggplot2 plot two data sets into one picture

this must be a FAQ, but I can't find an exactly similar example in the other answers (feel free to close this if you can point a similar Q&A). I'm still a newbie with ggplot2 and can't seem to wrap my head around it quite so easily.
I have 2 data.frames (that come from separate mixed models) and I'm trying to plot them both into the same graph. The data.frames are:
newdat
id Type pred SE
1 1 15.11285 0.6966029
2 1 13.68750 0.9756909
3 1 13.87565 0.6140860
4 1 14.61304 0.6187750
5 1 16.33315 0.6140860
6 1 16.19740 0.6140860
1 2 14.88805 0.6966029
2 2 13.46270 0.9756909
3 2 13.65085 0.6140860
4 2 14.38824 0.6187750
5 2 16.10835 0.6140860
6 2 15.97260 0.6140860
and
newdat2
id pred SE
1 14.98300 0.6960460
2 13.25893 0.9872502
3 13.67650 0.6150701
4 14.39590 0.6178266
5 16.37662 0.6171588
6 16.08426 0.6152017
As you can see, the second data.frame doesn't have Type, whereas the first does, and therefore has 2 values for each id.
What I can do with ggplot, is plot either one, like this:
fig1
fig2
As you can see, in fig 1 ids are stacked by Type on the x-axis to form two groups of 6 ids. However, in fig 2 there is no Type, but instead just the 6 ids.
What I would like to accomplish is to plot fig2 to the left/right of fig1 with similar grouping. So the resulting plot would look like fig 1 but with 3 groups of 6 ids.
The problem is also, that I need to label and organize the resulting figure so that for newdat the x-axis would include a label for "model1" and for newdat2 a label for "model2", or some similar indicator that they are from different models. And to make things even worse, I need some labels for Type in newdat.
My (hopefully) reproducible (but obviously very bad) code for fig 1:
library(ggplot2)
pd <- position_dodge(width=0.6)
ggplot(newdat,aes(x=Type,y=newdat$pred,colour=id))+
geom_point(position=pd, size=5)
geom_linerange(aes(ymin=newdat$pred-1.96*SE,ymax=newdat$pred+1.96*SE), position=pd, size=1.5, linetype=1) +
theme_bw() +
scale_colour_grey(start = 0, end = .8, name="id") +
coord_cartesian(ylim=c(11, 18)) +
scale_y_continuous(breaks=seq(10, 20, 1)) +
scale_x_discrete(name="Type", limits=c("1","2"))
Code for fig 2 is identical, but without the limits in the last line and with id defined for x-axis in ggplot(aes())
As I understand it, defining stuff at ggplot() makes that stuff "standard" along the whole graph, and I've tried to remove the common stuff and separately define geom_point and geom_linerange for both newdat and newdat2, but no luck so far... Any help is much appreciated, as I'm completely stuck.
How about adding first adding some new variables to each dataset and then combining them:
newdat$model <- "model1"
newdat2$model <- "model2"
newdat2$Type <- 3
df <- rbind(newdat, newdat2)
# head(df)
Then we can plot with:
library(ggplot2)
ggplot(df, aes(x = interaction(model, factor(Type)), y = pred, color = factor(id))) +
geom_point(position = position_dodge(width = 0.6), size = 5) +
geom_linerange(aes(ymin = pred - 1.96 * SE, ymax = pred + 1.96 * SE),
position = position_dodge(width = 0.6),
size = 1.5, linetype = 1)
Alternatively, you pass an additional aesthetic to geom_linerange to further delineate the model type:
ggplot(df, aes(x = interaction(model, factor(Type)), y = pred, color = factor(id))) +
geom_point(position = position_dodge(width = 0.6), size = 5) +
geom_linerange(aes(ymin = pred - 1.96 * SE, ymax = pred + 1.96 * SE, linetype = model),
position = position_dodge(width = 0.6),
size = 1.5)
Finally, you may want to considered facets:
ggplot(df, aes(x = interaction(model, factor(Type)), y = pred, color = factor(id))) +
geom_point(position = position_dodge(width = 0.6), size = 5) +
geom_linerange(aes(ymin = pred - 1.96 * SE, ymax = pred + 1.96 * SE),
position = position_dodge(width = 0.6),
size = 1.5) +
facet_wrap(~ id)

ggplot2: Adding another legend to a plot (two times)

I have the following data set that is used for plotting a bubble plot frequencies.
Freq are frequencies at time 1
Freq1 are frequencies at time 2
id names variable value Freq Freq.1
1 1 item1 1 13 11
2 2 item2 1 9 96
3 3 item1 2 10 28
4 4 item2 2 15 8
5 5 item1 3 9 80
6 6 item2 3 9 10
7 7 item1 4 11 89
8 8 item2 4 14 8
9 9 item1 5 3 97
10 10 item2 5 25 82
I am using the following code for plotting, and I do like the plot. However I am having some troubles with the legend that I explain below:
theme_nogrid <- function (base_size = 12, base_family = "") {
theme_bw(base_size = base_size, base_family = base_family) %+replace%
theme(panel.grid = element_blank())
}
plot1<- ggplot(Data, aes(x = variable, y = value, size = Freq, color=Freq.1))+
geom_point( aes(size = Freq, stat = "identity", position = "identity"),
shape = 19, color="black", alpha=0.5) +
geom_point( aes(size = Freq.1, stat = "identity", position = "identity"),
shape = 19, color="red", alpha=0.5) +
scale_size_continuous(name= "Frequencies ", range = c(2,30))+
theme_nogrid()
1- I would like to have two legends: one for color, the other one for size, but i can't get the right arguments to do it (I have consult guide and theme documentation and i can't solve my problem with my own ideas)
2- After having the two legends, I would like to increase the size of the legend shape in order to look bigger (not the text, not the background, just the shape (without actually changing the plot)).
Here and example from what I would have and what i would like (that's an example from my real data). As you can see is almost impossible to distinguish the color in the first image.
Sorry if it's a newbie question, but i can't really get an example of that.
Thanks,
Angulo
Try something like this
library(ggplot2)
library(tidyr)
d <- gather(Data, type, freq, Freq, Freq.1)
ggplot(d, aes(x = variable, y = value))+
geom_point(aes(size = freq, colour = type), shape = 19, alpha = 0.5) +
scale_size_continuous(name = "Frequencies ", range = c(2, 30)) +
scale_colour_manual(values = c("red", "blue")) +
theme_nogrid() +
guides(colour = guide_legend(override.aes = list(size = 10)))
The last line will make the circles in the "colour" legend larger.

Resources