I have a set of 10 density estimates, obtained from 5 sites using two differnt methods (REM and DS). Each density estimate has their respective confidence intervals, which are unequal.
I want a scatter plot with the x-axis showing the density from REM and the y-axis showing the density estimate from DS. I then want to a bubble around each point, representing the confidence intervals.
At the moment I can only seem to set specific height and width values for these confidence intervals, which would be fine if they were even. Since they are uneven, the bubbles will not be circles but should be more of an egg-shaped ellipse, off-centre from the point estimate.
This is the code I've used, in which you can see the respective confidence intervals. The plot shows what this makes, if the confidence intervals were event. How would I adapt this to make the confidence intervals uneven?
Thank you.
# sample data
df <- data.frame(site=c(1, 2, 3, 4, 5),
rem=c(17.7, 14.1, 10.6, 13.2, 1.0),
rem_lower=c(8.2, 6.6, 4.2, 3.2, 0.2),
rem_upper=c(27.1, 21.5, 17.0, 23.1, 1.7),
ds=c(16.6, 18.5, 5.2, 21.8, 2.4),
ds_lower=c(6.3, 5.1, 2.7, 4.5, 0.5),
ds_upper=c(40.4, 39.9, 10.9, 44.7, 8.3))
# calculate the width and height of each ellipse
width <- df$rem_upper - df$rem_lower
height <- df$ds_upper - df$ds_lower
# plot the data with ellipses
ggplot(df, aes(x = rem, y = ds, color = factor(site))) +
geom_point(size = 5) +
geom_ellipse(aes(x0 = rem, y0 = ds, a = width, b = height, fill = factor(site),
angle = 45), alpha = 0.3) +
scale_fill_manual(values = c("#1f78b4", "#33a02c", "#e31a1c", "#ff7f00", "#6a3d9a")) +
labs(x = "rem", y = "DS") +
theme_classic()
Related
Aim: simultaneously display columns and points of two different datasets, both taken from the same sites (x-axis).
I am able to plot a column chart of discrete site names (x) and continuous weight data (y). I have also been able to add an appropriately scaled second y axis, to plot points at each site, representing other continuous weight data of a much smaller scale than on the primary y axis.
However, the points seem to be using the primary y axis scale as coordinates and not the new secondary y axis, as intended. How do I ensure the new points are plotted against the new y-axis scale, while maintaining the primary column plot as it is?
Thanks
data:
Site_No (x) = 1:10
Total_Solids (y) = 30, 35, 32, 50, 55, 57, 45, 49, 55, 46
TOC (y2) = 1.3, 1.5, 1.7, 1.45, 1.03, 2.4, 1.9, 1.8, 1.1, 1.6
Code:
ggplot(df) +
geom_col(aes(x = Site_No, y = Total_Solids)) +
geom_point(aes(x = Site_No, y = TOC)) +
scale_y_continuous(name = "Total Solids (g)",
sec.axis = sec_axis(~ ./20, name = "Total Organic Carbon (g)"))
It's not sufficient to "rescale the scale". You also have to rescale the data to be plotted on the secondary axis using the inverse of the rescaling factor applied to the scale:
df <- data.frame(
Site_No = c(1:10),
Total_Solids = c(30, 35, 32, 50, 55, 57, 45, 49, 55, 46),
TOC = c(1.3, 1.5, 1.7, 1.45, 1.03, 2.4, 1.9, 1.8, 1.1, 1.6)
)
library(ggplot2)
ggplot(df) +
geom_col(aes(x = Site_No, y = Total_Solids)) +
geom_point(aes(x = Site_No, y = TOC * 20)) +
scale_y_continuous(name = "Total Solids (g)",
sec.axis = sec_axis(~ ./20, name = "Total Organic Carbon (g)"))
I have created an R dataframe as follows
A<-data.frame("Col1"= c(21.5 ,22.5 ,15.5, 20.5 ,17.5 ,14.5 ,23.5, 11.5, 16.5, 25.5 ,18.5, 24.5 ,10.5 , 9.5, 19.5, 26.5, 13.5, 12.5 ,27.5, 4.5 , 5.5, 8.5, 6.5, 7.5))
A$Col2=c(0.619219548, 0.723265668,0.122833055, 0.536849680, 0.257225692 ,0.081648474, 0.794797325 ,0.023125359, 0.194364553, 0.909681117, 0.343930779, 0.857658382, 0.018791029 ,0.014457257, 0.467485576 ,0.950865217, 0.062140165, 0.040464671, 0.989875246, 0.001502443,0.003637989 ,0.012290763, 0.005796326, 0.007959621)
I have created the following plot on log scale using ggplot2 package
library(scales)
library(ggplot2)
chart_1<-ggplot(A, aes(x=Col1, y=Col2)) + geom_point()+ geom_smooth(method = "lm")+
scale_x_log10(minor_breaks = seq(0,max(A$Col1)*10 , 0.1), breaks = pretty_breaks())+
scale_y_log10(minor_breaks = seq(0,100,0.1))+ annotation_logticks(sides = "lb", outside =
FALSE,short = unit(1,"mm"), mid = unit(3,"mm"),long = unit(6,"mm")) + theme( panel.grid.major
= element_line(colour = "red", size = 0.5), panel.grid.minor= element_line(colour = "green",
size = 0.2))
In this I am able to generate a Y axis with uniform 9 annotation logticks between 2 major gridlines. ie between 0.001 - 0.01, 0.01 - 0.1 ,0.1 - 1, the axis is divided equally into 10 divisions. I would like the same to be done along the x axis dynamically. I am unable to accomplish the same. I request someone to guide me in this regard. Many thanks in advance
I believe your code is working just fine.
The annotation_logticks will write 10 marks between each log10 default scale values.
This way, you have 10 tickmarks between 0.01 and 0.1, 10 tickmarks between 0.1 and 1, 10 tickmarks between 1 and 10 (you can see in your x-axis the marks on 5,6,7,8,9 and 10; and 10 tickmarks between 10 and 100 -> 20,20,40...100. You can see the tickmark on 20 and 30 on your x-axis.
For example, I have a sample data of human height in a DataFrame:
df <- data_frame(height = c(1.5, 1.6, 1.7, 1.8, 1.9), number = c(20, 30, 50, 30, 20))
How can I calculate the 90% quantile of this sample?
I know ggplot2 has a function can plot the ecdf of the sample:
ggplot(df, aes(x = height, y = number)) + stat_ecdf()
but I only need a specified quantile not the plot.
I could repeat each height number times to make a vector and use the quantile function on the vector, but as the number getting larger, this method seems to be very inefficient.
EDIT:
It seems stat_ecdf are not supposed to be used in this way, and when data distribution is skewed:
df <- data_frame(height = c(1.5, 1.6, 1.7, 1.8, 1.9), number = c(100, 2, 3, 4, 5))
only quantile of the repeated vector gives the desired result:
quantile(c(rep(1.5,100), rep(1.6,2), rep(1.7,3), rep(1.8,4), rep(1.9,5)))
I am trying to add significance asterisks to my ggplot boxplot, using groups (fill) and facets.
Using geom_signif() I can add bars such as:
I am trying to do the same for the dodged boxplots too.. similar to
(Imagine there were significance values above the smaller lines...)
The code for the former graph:
data:
library(ggplot2)
library(ggsignif)
df <- data.frame(iris,petal.colour=c("red","blue"), country=c("UK","France","France"))
First plot:
ggplot(df, aes(country,Sepal.Length))+
geom_boxplot(position="dodge",aes(fill=petal.colour))+
facet_wrap(~Species, ncol=3)+
geom_signif(comparisons = list(c("France", "UK")), map_signif_level=TRUE,
tip_length=0,y_position = 9, textsize = 4)
and for the smaller bars
+geom_signif(annotations = c("", ""),
y_position = 8.5,
xmin=c(0.75,1.75), xmax=c(1.25,2.25),tip_length=0)
It would great to let R do the work, but if its easier to manually add text above these smaller lines then that's fine with me.
I can't figure out how to get them to work for that group using geom_signif. See the first part for my attempt. I was able to get it to work using ggpubr and stat_compare_means, which I believe is an extension of geom_signif.
ggplot(df, aes(country,Sepal.Length)) +
geom_boxplot(position="dodge",aes(fill=petal.colour)) +
facet_wrap(~Species, ncol=3) +
geom_signif(comparisons = list(c("France", "UK")), map_signif_level=TRUE,
tip_length=0,y_position = 9, textsize = 4) +
geom_signif(y_position = 8.5,
xmin=c(0.75,1.75), xmax=c(1.25,2.25), tip_length=0, map_signif_level = c("***" = 0.001, "**" = 0.01, "*" = 0.05))
Warning messages:
1: In wilcox.test.default(c(4.9, 4.7, 5, 5.4, 5, 4.4, 5.4, 4.8, 4.3, :
cannot compute exact p-value with ties
2: In wilcox.test.default(c(7, 6.9, 5.5, 5.7, 6.3, 6.6, 5.2, 5.9, 6, :
cannot compute exact p-value with ties
3: In wilcox.test.default(c(6.3, 5.8, 6.3, 6.5, 4.9, 7.3, 7.2, 6.5, :
cannot compute exact p-value with ties
4: Computation failed in `stat_signif()`:
arguments imply differing number of rows: 6, 0
5: Computation failed in `stat_signif()`:
arguments imply differing number of rows: 6, 0
6: Computation failed in `stat_signif()`:
arguments imply differing number of rows: 6, 0
Using ggpubr and stat_compare_means. Note you can use different labels, and tests, etc. See ?stat_compare_means.
library(ggpubr)
ggplot(df, aes(country,Sepal.Length)) +
geom_boxplot(position="dodge",aes(fill=petal.colour)) +
facet_wrap(~Species, ncol=3) +
stat_compare_means(aes(group = country), label = "p.signif", label.y = 10, label.x = 1.5) +
stat_compare_means(aes(group = petal.colour), label = "p.format", label.y = 8.5)
Maybe you can save the plot as .pdf file and try to use Adobe Illustrator to manually add whatever you want into the plot, the greatest advantage of R plot is its perfect compatibility with Adobe Illustrator.
Or maybe you can try to set
map_signif_level = c("***"=0.001, "**"=0.01, "*"=0.05)
in geom_signif
Hope that helps
I'm fairly new to R so please comment on anything you see.
I have data taken at different timepoints, under two conditions (for one timpoint) and I want to plot this as a bar plot with errorbars and with the bars at the appropriate timepoint.
I currently have this (stolen from another question on this site):
library(ggplot2)
example <- data.frame(tp = factor(c(0, "14a", "14b", 24, 48, 72)), means = c(1, 2.1, 1.9, 1.8, 1.7, 1.2), std = c(0.3, 0.4, 0.2, 0.6, 0.2, 0.3))
ggplot(example, aes(x = tp, y = means)) +
geom_bar(position = position_dodge()) +
geom_errorbar(aes(ymin=means-std, ymax=means+std))
Now my timepoints are a factor, but the fact that there is an unequal distribution of measurements across time makes the plot less nice.!
This is how I imagine the graph :
I find the ggplot2 package can give you very nice graphs, but I have a lot more difficulty understanding it than I have with other R stuff.
Before we get into R, you have to realize that even in a bar plot the x axis needs a numeric value. If you treat them as factors then the software assumes equal spacing between the bars by default. What would be the x-values for each of the bars in this case? It can be (0, 14, 14, 24, 48, 72) but then it will plot two bars at point 14 which you don't seem to want. So you have to come up with the x-values.
Joran provides an elegant solution by modifying the width of the bars at position 14. Modifying the code given by joran to make the bars fall at the right position in the x-axis, the final solution is:
library(ggplot2)
example <- data.frame(tp = factor(c(0, "14a", "14b", 24, 48, 72)), means = c(1, 2.1, 1.9, 1.8, 1.7, 1.2), std = c(0.3, 0.4, 0.2, 0.6, 0.2, 0.3))
example$tp1 <- gsub("a|b","",example$tp)
example$grp <- c('a','a','b','a','a','a')
example$tp2 <- as.numeric(example$tp1)
ggplot(example, aes(x = tp2, y = means,fill = grp)) +
geom_bar(position = "dodge",stat = "identity") +
geom_errorbar(aes(ymin=means-std, ymax=means+std),position = "dodge")