Set free y limits in ggplot2 facets while using coord_cartesian - r

I have a data frame 'data' with three columns. The first column identifies the compound, the second the concentration of the compound and the third my measured data called 'Area'.
# A tibble: 12 x 3
Compound Conc Area
<chr> <dbl> <dbl>
1 Compound 1 0 247
2 Compound 1 5 44098
3 Compound 1 100 981797
4 Compound 1 1000 7084602
5 Compound 2 0 350
6 Compound 2 5 310434
7 Compound 2 100 6621537
8 Compound 2 1000 49493832
9 Compound 3 0 26
10 Compound 3 5 7707
11 Compound 3 100 174026
12 Compound 3 1000 1600143
I want to create a facetted plot per compound using geom_point & apply geom_smooth on the complete x axis. To look into detail in the lower concentration range I applied coord_cartesian to limit the x axis from 0 to 110.
However, each facet takes the maximum value of the given compound. As the scales are very different between compounds I can't use a fixed ylim as it would have to be different for each compound (in my real data I have > 20 compounds).
Is there a possibility to set the y-axis from 0 as minimum and as maximum per facet the maximal value which is visible?
The code I have (without any tries on limiting the y-axis is:
ggplot(data = data, aes(Conc, Area)) +
geom_point(size = 2.5) +
geom_smooth(method = "lm") +
facet_wrap(~Compound, ncol = 3, scales = "free_y") +
theme_bw() +
theme(legend.position = "bottom") +
coord_cartesian(xlim = c(0,110))

I figured out a workaround to get the results I want.
After creating a subset of the data I created a loop to plot all the data.
The subsetted data was used to determine the ylim in coord_cartesian.
With the resulting plot list I can use the gridExtra package to sort them in a grid.
data_100 <- data %>%
filter(Conc <= 110)
loop.vector <- unique(data$Compound)
plot_list = list()
for (i in seq_along(loop.vector)) {
p = ggplot(subset(data, data$Compound==loop.vector[i]),
aes(Conc, Area)) +
geom_point(size=2.5) +
geom_smooth(method = "lm", se = FALSE) +
theme_bw() +
theme(legend.position="bottom") +
coord_cartesian(xlim = c(0,110),
ylim = c(0, max(data_100$Area[data_100$Compound==loop.vector[i]]))) +
labs(title = loop.vector[i])
plot_list[[i]] = p
print(p)
}

Related

How can I use the ggplot function to visualise grouped data?

I have a data set which has the time taken for individuals to read a sentence (response_time) under the experimental factors of the condition of the sentence (normal or visually degraded) and the number of cups of coffee (caffeine) that an individual has drunk. I want to visualise the data using ggplot, but with the data grouped according to the condition of the sentence and the coffee drunk - e.g. the response times recorded for individuals reading a normal sentence and having drunk one cup of coffee.
This is what I have tried so far, but the graph comes up as one big blob (not separated by group) and has over 15 warnings!!
participant condition response_time caffeine
<dbl> <fct> <dbl> <fct>
1 1 Normal 984 1
2 2 Normal 1005 1
3 3 Normal 979 3
4 4 Normal 1040 2
5 5 Normal 1008 2
6 6 Normal 979 3
>
tidied_data_2 %>%
ggplot(aes(x = condition:caffeine, y = response_time, colour = condition:caffeine)) +
geom_violin() +
geom_jitter(width = .1, alpha = .25) +
guides(colour = FALSE) +
stat_summary(fun.data = "mean_cl_boot", colour = "black") +
theme_minimal() +
theme(text = element_text(size = 13)) +
labs(x = "Condition X Caffeine", y = "Response Time (ms)")
Any suggestions on how to better code what I want would be great.
As a wiki answer because too long for a comment.
Not sure what you are intending with condition:caffeine - I've never seen that syntax in ggplot. Try aes(x = as.character(caffeine), y = ..., color = as.character(caffeine)) instead (or, because it is a factor in your case anyways, you can just use aes(x = caffeine, y = ..., color = caffeine)
If your idea is to separate by condition, you could just use aes(x = caffeine, y = ..., color = condition), as they are going to be separated by x anyways.
of another note - why not actually plotting a scatter plot? Like making this a proper two-dimensional graph. suggestion below.
library(ggplot2)
library(dplyr)
tidied_data_2 <- read.table(text = "participant condition response_time caffeine
1 1 Normal 984 1
2 2 Normal 1005 1
3 3 Normal 979 3
4 4 Normal 1040 2
5 5 Normal 1008 2
6 6 Normal 979 3", head = TRUE)
tidied_data_2 %>%
ggplot(aes(x = as.character(caffeine), y = response_time, colour = as.character(caffeine))) +
## geom_violin does not make sense with so few observations
# geom_violin() +
## I've removed alpha so you can see the dots better
geom_jitter(width = .1) +
guides(colour = FALSE) +
stat_summary(fun.data = "mean_cl_boot", colour = "black") +
theme_minimal() +
theme(text = element_text(size = 13)) +
labs(x = "Condition X Caffeine", y = "Response Time (ms)")
what I would rather do
tidied_data_2 %>%
## in this example as.integer(as.character(x)) is unnecessary, but it is necessary for your data sample
ggplot(aes(x = as.integer(as.character(caffeine)), y = response_time)) +
geom_jitter(width = .1) +
theme_minimal()

How to plot lines and dots in the same plot while using different sized data

This toy data frame represents my data.
Time Gene Value
1 0 A 1
2 1 A 2
3 2 A 3
4 0 B 1
5 1.2 B 2
6 1.7 B 2
7 2.1 B 2
8 3 B 2
Using the following code I can turn this into a line plot with two lines, one for A and one for B.
ggplot(data=Data, aes(x=Time, y=Value, group=Gene)) +
geom_line(aes(color=Gene), linetype="longdash", size=2)+
theme_classic()+
labs(title= paste("Genes over time course"),
x="Time",
y="Expression")+
theme(plot.title=element_text(size=20, face="bold",hjust = 0.5),
axis.text.x=element_text(size=10),
axis.text.y=element_text(size=10),
axis.title.x=element_text(size=15),
axis.title.y=element_text(size=15),
legend.text=element_text(size=10))
However, I would like Gene A to be represented by only dots, and Gene B to be represented by only a line. How can I accomplish this given the data?
Using data=~subset(., ...) we can control which data goes to each layer.
ggplot(Data, aes(x = Time, y = Value, color = Gene, group = Gene)) +
geom_line(data = ~ subset(., Gene != "A")) +
geom_point(data = ~ subset(., Gene == "A"))
(You can also use dplyr::select in place of subset, the results are the same.)

stat_cor function incorrectly calculating correlation p value

I am using stat_cor with ggplot to add r and p values to a scatter plot. It is incorrectly calculating the p value based on the number of observations in long format data frame. It is confused about long format organization, and the p value is corresponds to if there were as many subjects as there were observations. The subject id variable is properly recognized and understood as factor when I check the structure of the data frame. Does anybody know how to fix this?
Long df example
subject sex condition x y
1 1 M control 7.9 1
2 1 M cond1 12.3 2
3 1 M cond2 10.7 3
4 2 F control 6.3 4
5 2 F cond1 10.6 5
6 2 F cond2 11.1 6
Here is the code
library(ggplot2)
library(ggpubr)
scatter <- ggplot(df, aes(x = x, y = y)) +
geom_point(aes(colour = condition)) +
geom_smooth(method = "lm") +
ggtitle("title") +
theme(axis.text=element_text(size=14),
axis.title=element_text(size=14,face="bold"),
plot.title = element_text(size = 20, face = "bold"))
scatter + stat_cor(method = "pearson", label.x = -2, label.y = 3)
having plot + scat_cor(method = "pearson"...) should calculate the pearson value of x and y (this page has the formula correct: http://www.stat.wmich.edu/s216/book/node122.html)
it is calculating the p value as if the sample size, n , is the number of observations in long frame df.
I "fixed" this issue by reshaping the dataframe to wide. I was hoping, and imagine there still is, a fix that can get you the correct values while still in a long format, as most of R prefers long format.

Character values on a continuous axis in R ggplot2

Is there a way to include character values on the axes when plotting continuous data with ggplot2? I have censored data such as:
x y Freq
1 -3 16 3
2 -2 12 4
3 0 10 6
4 2 7 7
5 2 4 3
The last row of data are right censored. I am plotting this with the code below to produce the following plot:
a1 = data.frame(x=c(-3,-2,0,2,2), y=c(16,12,10,7,4), Freq=c(3,4,6,7,3))
fit = ggplot(a1, aes(x,y)) + geom_text(aes(label=Freq), size=5)+
theme_bw() +
scale_x_continuous(breaks = seq(min(a1$x)-1,max(a1$x)+1,by=1),
labels = seq(min(a1$x)-1,max(a1$x)+1,by=1),
limits = c(min(a1$x)-1,max(a1$x)+1))+
scale_y_continuous(breaks = seq(min(a1$y),max(a1$y),by=2))
The 3 points at (2,4) are right censored. I would like them to be plotted one unit to the right with the corresponding xaxis tick mark '>=2' instead of 3. Any ideas if this is possible?
It is quite possible. I hacked the data so 2,4 it's 3,4. Then I modified your labels which can be whatever you want as long as they are the same length as the breaks.
ggplot(a1, aes(x,y)) + geom_text(aes(label=Freq), size=5)+
theme_bw() +
scale_x_continuous(breaks = seq(min(a1$x)-1,max(a1$x),by=1),
labels = c(seq(min(a1$x)-1,max(a1$x)-1,by=1), ">=2"),
limits = c(min(a1$x)-1,max(a1$x)))+
scale_y_continuous(breaks = seq(min(a1$y),max(a1$y),by=2))

Getting percentage using histogram when used with facetting

I have the following data frame
z x y
1 1 a
2 2 a
3 1 a
4 2 a
5 1 b
6 9 b
7 9 b
8 8 b
9 7 b
when I do
p = ggplot(z,aes(x,group=y)) + geom_histogram(aes(y = ..density..,group=y)) + facet_grid(y ~ .)
p
I get the faceted plots, but not with the percentages on the y-axis for each symbol within z$y.
Basically, I want a histogram chart, but with the percentages that show the frequency distribution within each value of z$y i.e. a,b.
In this case, under 'a', 50% is 1 and 50% is 2, and under 'b', 20% is 1, 40% is 9, 20% is 7 and 20% is 8. I want this charted as histograms using faceting.
That is not a histogram (there is no density estimation), but a bar chart.
d <- data.frame(
value = c(1,2,1,2,1,9,9,8),
group = c(rep("a",4),rep("b",4))
)
# With counts
ggplot(d) + geom_bar(aes(factor(value))) + facet_grid(group ~ .)
# With percentages
ggplot(d) +
geom_bar(aes(factor(value), (..count..)/sum(..count..))) +
scale_y_continuous(formatter = 'percent') +
facet_grid(group ~ .)
Note: In more recent versions of ggplot2 we would use scale_y_continuous(labels = percent_format()) instead, and make sure to load the scales package.

Resources