ggplot without the use of subset

ggplot without the use of subset - r

I'm using ggplot2 with the faceting option to plot several results of a data.frame.
It's a data.frame with three factors :
participant (N) with 6 levels;
condition (C) with 6 levels;
stimuli (S) with 10 conditions.
I plot the results of one participants in one condition using the subset function and then I facet with ggplot. However, I was wondering if there was an easier solution in ggplot2?
Thanks for any help, I'm currently learning R and ggplot2.

It sounds like you're trying to ask how to set up a two-way facet. I'm going to guess that 'stimuli is your predictor variable.
One way is like this:
ggplot( mydata, aes( x = stimuli, y = my.response) +
facet_wrap( condition ~ participant) +
geom_line()
or
geom_point()

Related

Determine order of several boxplots in one plot in R qqplot

I tried to create a relatively simple boxplot plot in R's ggplot2: One value on the x axis and several variables on the y axis. I'm using a code similar to this one:
ggplot() +
# Boxplot 1
geom_boxplot(df[which(df$Xvalue=="Boxplot1"),],
mapping = aes(X, "Y")) +
# Boxplot 2
geom_boxplot(df[which(df$Xvalue=="Boxplot2"),],
mapping = aes(X, "Y")) +
# Boxplot 3
geom_boxplot(df[which(df$Xvalue=="Boxplot3"),],
mapping = aes(X, "Y")) +
The boxplots in my real code are ordered alphabetically, however, I need them to be in a customized, categorial order.
I'm aware I could restructure my data frame so that I don't use a subset and a new geom_boxplot command for each boxplot, but I've structured the data that way for other reasons and that's not the solution I'm looking for right now.
Maybe there is an easy way using the scale_Y_manual or else? Any help is appreciated!

How to run and annotate separate TukeyHSD for individual facets in ggplot2 boxplot?

I am not very good at R and am trying to pull together this code that is not quite working out how I would like it to. I would really appreciate any help on this!
I would like to perform TukeyHSD test among treatment groups in individual facets in my ggplot boxplots. Currently though, my figures apply a single TukeyHSD across all the boxplots in the figure and this results in a huge number of groupings as you can see in the figure:
my current plot
As I mentioned, it would be preferable to have TukeyHSD run on the individual Depth separated "0" facet, then "5" facet, then "30" facet separately. Is this possible by modifying the code I have been using?
data1 <- read.delim(file="clipboard")
data1$Treatment <- as.factor(data1$Treatment)
data1$Depth <- as.factor(data1$Depth)
model<- aov(MBC~Treatment*Depth, data=data1)
model
library(emmeans)
library('multcomp')
cld_dat = as.data.frame( cld(emmeans(model,~Depth*Treatment),
Letters = letters ) )
ggplot(data1, aes(x=Treatment, y=MBC, fill=Treatment)) +
geom_boxplot() +
ylab("MBC") +
ggtitle("Melinis") +
facet_wrap(~Depth,ncol=3) +
geom_text(data = cld_dat, aes(y = 140, label = .group))
One more question, if this is possible: how would I add another y variable "CB" as a second row identical to how I have the first row variable "MBC"?
Thank you for any suggestions!
Treatment Depth MBC CB

If I understand correctly which factor has those facet levels, what you need is
cld(emmeans(model, ~ Treatment | Depth))

Apply ggplot2 across columns

I am working with a dataframe with many columns and would like to produce certain plots of the data using ggplot2, namely, boxplots, histograms, density plots. I would like to do this by writing a single function that applies across all attributes (columns), producing one boxplot (or histogram etc) and then storing that as a given element of a list into which all the boxplots will be chained, so I could later index it by number (or by column name) in order to return the plot for a given attribute.
The issue I have is that, if I try to apply across columns with something like apply(df,2,boxPlot), I have to define boxPlot as a function that takes just a vector x. And when I do so, the attribute/column name and index are no longer retained. So e.g. in the code for producing a boxplot, like
bp <- ggplot(df, aes(x=Group, y=Attr, fill=Group)) +
geom_boxplot() +
labs(title="Plot of length per dose", x="Group", y =paste(Attr)) +
theme_classic()
the function has no idea how to extract the info necessary for Attr from just vector x (as this is just the column data and doesn't carry the column name or index).
(Note the x-axis is a factor variable called 'Group', which has 6 levels A,B,C,D,E,F, within X.)
Can anyone help with a good way of automating this procedure? (Ideally it should work for all types of ggplots; the problem here seems to simply be how to refer to the attribute name, within the ggplot function, in a way that can be applied / automatically replicated across the columns.) A for-loop would be acceptable, I guess, but if there's a more efficient/better way to do it in R then I'd prefer that!
Edit: something like what would be achieved by the top answer to this question: apply box plots to multiple variables. Except that in that answer, with his code you would still need a for-loop to change the indices on y=y[2] in the ggplot code and get all the boxplots. He's also expanded-grid to include different ````x``` possibilities (I have only one, the Group factor), but it would be easy to simplify down if the looping problem could be handled.
I'd also prefer just base R if possible--dplyr if absolutely necessary.

Here's an example of iterating over all columns of a data frame to produce a list of plots, while retaining the column name in the ggplot axis label
library(tidyverse)
plots <-
imap(select(mtcars, -cyl), ~ {
ggplot(mtcars, aes(x = cyl, y = .x)) +
geom_point() +
ylab(.y)
})
plots$mpg
You can also do this without purrr and dplyr
to_plot <- setdiff(names(mtcars), 'cyl')
plots <-
Map(function(.x, .y) {
ggplot(mtcars, aes(x = cyl, y = .x)) +
geom_point() +
ylab(.y)
}, mtcars[to_plot], to_plot)
plots$mpg

How to respect quantitative nature of discrete/group variables in R ggplot2? [duplicate]

This question already has an answer here:
How to plot a boxplot with correctly spaced continuous x-axis values in ggplot2
(1 answer)
Closed 3 years ago.
I'd like to do a plot with R ggplot2 functions to highlight relations between a categorical X and a continuous Y variable. But my categorical variable is quantitative (e.g integers) and I would like my plots to respect the position suggested by the quantitative value of X.
Imagine the following dataset:
library(tidyverse)
df <- data.frame(Category=sample(c(1, 2, 5), 1000, replace = T)) %>%
mutate(Value=Category+rnorm(1000))
The easiest boxplot would be :
ggplot(df, aes(x=as.factor(Category), y=Value)) +
geom_boxplot() +
labs(x="Category")
But what I would like is :
add_row(df, Category=3:4, Value=NA) %>%
ggplot(aes(x=as.factor(Category), y=Value)) +
geom_boxplot() +
labs(x="Category")
Do you know any proper way to achieve that beyond the ugly trick above that is not really scalable? Because we can imagine many boxplots. Or even the case in which my categories are decimal values (with of course a limited number of categories). All in all, my wish is to be able to distribute my boxplots along the x-axis according to the quantitative value of the categories. The same question could apply to barplot instead of boxplots of course...
Thanks a lot!

As mentioned by #camille, you should write:
ggplot(df, aes(x=Category, y=Value, group = Category)) +
geom_boxplot() +
labs(x="Category")

R continuous vs categorical percentage share with geom_line

I'd like to create a ggplot geom_line graph with continuous data on the x-axis and the percentage share of a categorical variable.
E.g. for mtcars I would like to have hp on the x-axis and the percentage of the cars that have 6 cylinders on the y-axis.
ggplot2(aes(x=hp,y=cyl), data=mtcars) +
geom_line()
I think it needs to be defined in geom_line by fun.y or something similar.

Compute the frequencies beforehand, using reshape for instance :
library(reshape)
M <- melt(mtcars,id.vars="hp",measure.vars="cyl")
C <- cast(M,hp~ variable)
C$f <- C$cyl/sum(C$cyl)
ggplot(C,aes(x=hp,y=f)) +
geom_line()
Note that in that case, a line plot doesn't seem to make much sense, data points are too far appart. You could use a bar plot instead :
ggplot(C,aes(x=hp,y=f)) +
geom_bar(stat="identity")