Multiple plots by factor in ggplot (facets) - r

I have a data frame with two qualitative variables (Q1, Q2) which are both measured on a scale of LOW, MEDIUM, HIGH and a continuous variable CV on a scale 0-100.
s = 5
trial <- data.frame(id = c(1:s),
Q1 = ordered(sample(c("LOW","MED","HIGH"),size=s,replace=T)),
Q2 = ordered(sample(c("LOW","MED","HIGH"),size=s,replace=T)),
CV = runif(s,0,100))
I need to use ggplot to show a faceted plot (preferably a horizontal boxplot/jitter) of the continous variable for each qualitative variable (x2) for each level (x3). This would result in a 3 x 2 layout.
As I'm very new to ggplot I'm unsure how this should be achieved. I've played with qplot and and can't work out how to control the facets to display both Q1 and Q2 boxplots on the same chart!!
Do I need to run multiple qplots to the same window (in base I would use par to control layout) or can it be achieved from a single command. Or should I try to melt the data twice?
trial = rbind(data.frame(Q = "Q1",Level = trial[,2], CV = trial[,4]),
data.frame(Q = "Q2",Level = trial[,3], CV = trial[,4]))
I'll keep trying and hope somebody can provide some hints in the meantime.

I'm not entirely clear on what you want, but maybe this helps:
ggplot(trial, aes(Level, CV)) +
geom_boxplot() +
geom_jitter() +
facet_wrap(~Q) +
coord_flip()

Related

How to group by more than one variable to get 4 graphs contingent on two groupings in one plot?

I have a long-format data set (“uni_l_all”) and I’m trying to get a plot with 4 graphs, showing different trajectories contingent on high vs. low self-efficacy (“self_efficacy”, binary variable) and on training group vs. control group (“train2”, binary variable).
Grouping by self_efficacy to get two graphs works well.
But when I try to introduce “train2”, I still receive only 2 (crazy looking) graphs.
Do you have an idea how to solve this? How can I add train2 in my functions?
mean_uni_l_all <- group_by(uni_l_all, self_efficacy, time) %>%
summarise(ent_act = mean(ent_act, na.rm = TRUE))
ggplot(na.omit(mean_uni_l_all), aes(x = time, y = ent_act, colour = ese_mean, group = self_efficacy)) +
geom_point() + geom_line()

Question: Use a factor's index to plot variables

I'm very new to R so I'm sorry if this is something really simple.
I've had a look on a bunch of cheat sheets and can't see anything obvious.
I have a simple set of data that has date, temperature, and 4 different factors (based on the bloom of a tree // 1 = "", 2 = "bloom", 3 = "full", 4 = "scatter")
What I want to do, but have no idea how, is to do a scatter plot of the date and temperature of each factor individually.
One approach is to use ggplot2 with facet_wrap. First, be sure to set the level names of the Bloom factor so the plots will label usefully.
Then, we use ggplot to plot the data and group = by the Bloom factor. Then we add facet_wrap with the formula that . (everything else) should be grouped by Bloom.
library(ggplot2)
levels(TreeData$Bloom) <- c("None","Bloom","Full","Scatter")
ggplot(TreeData, aes(x=Date,y=Temp,group = Bloom, color = Bloom)) +
geom_point(show.legend = FALSE) +
facet_wrap(. ~ Bloom)
Per your comment, if you wanted individual graphs you could use base R subsetting with TreeData[TreeData$Bloom == "Full",]. Note that "Full" is the factor level we set earlier.
ggplot(TreeData[TreeData$Bloom == "Full",], aes(x=Date,y=Temp)) +
geom_point() + labs(title="Full Bloom")
Data
set.seed(1)
TreeData <- data.frame(Date = rep(seq.Date(from=as.Date("2019-04-01"), to = as.Date("2019-08-01"), by = "week"),each = 10) , Temp = round(runif(22,38,n=180)), Bloom = as.factor(sample(1:4,180,replace = TRUE)))

Ordering in ggplot2 [plotting pvals by BP for each chr]

I'm trying to plot points along the genome: there will be plot points for every chromosome. My data file looks like this:
CHROM BP P DP
1 234567 0.0000555 30
.....
Y 12345678 0.09 14
I'm using gglopt2 to plot P values, coloured by DP, for each chromosome, using the following:
mc.points <- ggplot(sample,aes(x = BP,y = P, colour =DP)) +
geom_point() +
labs(x = "Chromosome",y = "P") +
scale_color_gradient2(low = "green", high = "red")
However, instead of being plotted at each BP in the right chromosomal order, its being plotted by BP without any thought of chromosome number.
Is there a way to sort the data to make this happen (ie order by chromosome then BP)? I've tried to make CHROM and BP factors but this seems to crash R. In addition, if this is possible is there a way to label the X-tics on the X axis as chromosome numbers rather than BP (similar to a Manhattan plot).
I can provide dummy data if need be but this is quite long.
Just to provide an update: facet_grid seems to solve my problem but I was wondering whether I can transform this? It splits the grids by chromosome, but doesn't plot them on the same x-axis in consecutive order - But plots 22 different plots using the same scale x-axis. Any solutions?????
Have you tried something this untested code before the plot:
sample$BP <- factor(sample$BP,
levels=sample[ !duplicated(sample[,"BP"]), "BP"][
order(sample[!duplicated(sample[ ,"BP"]), "chromosome"] )]
)
Would have been easier and perhaps more compact if you included a suitable sample for testing. In the future you should NOT use the name `sample" since it is an important R function name.

ggplot2: how to overlay 2 plots when using stat_summary

i am totally new in R so maybe the answer to the question is trivial but I couldn't find any solution after searching in the net for days.
I am using ggplot2 to create graphs containing the mean of my samples with the confidence interval in a ribbon (I can't post the pic but something like this: S1
I have a data frame (df) with time in the first column and the values of the variable measured in the other columns (each column is a replicate of the measurement).
I do the following:
mdf<-melt(df, id='time', variable_name="samples")
p <- ggplot(data=mdf, aes(x=time, y=value)) +
geom_point(size=1,colour="red")
stat_sum_df <- function(fun, geom="crosbar", ...) {
stat_summary(fun.data=fun, geom=geom, colour="red")
}
p + stat_sum_df("mean_cl_normal", geom = "smooth")
and I get the graph I have shown at the beginning.
My question is: if I have two different data frames, each one with a different variable, measured in the same sample at the same time, how I can plot the 2 graphs in the same plot? Everything I have tried ends in doing the statistics in the both sets of data or just in one of them but not in both. Is it possible just to overlay the plots?
And a second small question: is it possible to change the colour of the ribbon?
Thanks!
something like this:
library(ggplot2)
a <- data.frame(x=rep(c(1,2,3,5,7,10,15,20), 5),
y=rnorm(40, sd=2) + rep(c(4,3.5,3,2.5,2,1.5,1,0.5), 5),
g = rep(c('a', 'b'), each = 20))
ggplot(a, aes(x=x,y=y, group = g, colour = g)) +
geom_point(aes(colour = g)) +
geom_smooth(aes(fill = g))
I'd suggest you reading the basics of ggplot. Check ?ggplot2 for help on ggplot but also available help topics here and particularly how group aesthetic may be manipulated.
You'll find useful the discussion group at Google groups and maybe join it. Also, QuickR have a lot of examples on ggplot graphs and, obviously, here at Stackoverflow.

ggplot boxplots with scatterplot overlay (same variables)

I'm an undergrad researcher and I've been teaching myself R over the past few months. I just started trying ggplot, and have run into some trouble. I've made a series of boxplots looking at the depth of fish at different acoustic receiver stations. I'd like to add a scatterplot that shows the depths of the receiver stations. This is what I have so far:
data <- read.csv(".....MPS.csv", header=TRUE)
df <- data.frame(f1=factor(data$Tagging.location), #$
f2=factor(data$Station),data$Detection.depth)
df2 <- data.frame(f2=factor(data$Station), data$depth)
df$f1f2 <- interaction(df$f1, df$f2) #$
plot1 <- ggplot(aes(y = data$Detection.depth, x = f2, fill = f1), data = df) + #$
geom_boxplot() + stat_summary(fun.data = give.n, geom = "text",
position = position_dodge(height = 0, width = 0.75), size = 3)
plot1+xlab("MPS Station") + ylab("Depth(m)") +
theme(legend.title=element_blank()) + scale_y_reverse() +
coord_cartesian(ylim=c(150, -10))
plot2 <- ggplot(aes(y=data$depth, x=f2), data=df2) + geom_point()
plot2+scale_y_reverse() + coord_cartesian(ylim=c(150,-10)) +
xlab("MPS Station") + ylab("Depth (m)")
Unfortunately, since I'm a new user in this forum, I'm not allowed to upload images of these two plots. My x-axis is "Stations" (which has 12 options) and my y-axis is "Depth" (0-150 m). The boxplots are colour-coded by tagging site (which has 2 options). The depths are coming from two different columns in my spreadsheet, and they cannot be combined into one.
My goal is to to combine those two plots, by adding "plot2" (Station depth scatterplot) to "plot1" boxplots (Detection depths). They are both looking at the same variables (depth and station), and must be the same y-axis scale.
I think I could figure out a messy workaround if I were using the R base program, but I would like to learn ggplot properly, if possible. Any help is greatly appreciated!
Update: I was confused by the language used in the original post, and wrote a slightly more complicated answer than necessary. Here is the cleaned up version.
Step 1: Setting up. Here, we make sure the depth values in both data frames have the same variable name (for readability).
df <- data.frame(f1=factor(data$Tagging.location), f2=factor(data$Station), depth=data$Detection.depth)
df2 <- data.frame(f2=factor(data$Station), depth=data$depth)
Step 2: Now you can plot this with the 'ggplot' function and split the data by using the `col=f1`` argument. We'll plot the detection data separately, since that requires a boxplot, and then we'll plot the depths of the stations with colored points (assuming each station only has one depth). We specify the two different plots by referencing the data from within the 'geom' functions, instead of specifying the data inside the main 'ggplot' function. It should look something like this:
ggplot()+geom_boxplot(data=df, aes(x=f2, y=depth, col=f1)) + geom_point(data=df2, aes(x=f2, y=depth), colour="blue") + scale_y_reverse()
In this plot example, we use boxplots to represent the detection data and color those boxplots by the site label. The stations, however, we plot separately using a specific color of points, so we will be able to see them clearly in relation to the boxplots.
You should be able to adjust the plot from here to suit your needs.
I've created some dummy data and loaded into the chart to show you what it would look like. Keep in mind that this is purely random data and doesn't really make sense.

Resources