plot a column chart with two y axis in ggplot2 - r

Aim: simultaneously display columns and points of two different datasets, both taken from the same sites (x-axis).
I am able to plot a column chart of discrete site names (x) and continuous weight data (y). I have also been able to add an appropriately scaled second y axis, to plot points at each site, representing other continuous weight data of a much smaller scale than on the primary y axis.
However, the points seem to be using the primary y axis scale as coordinates and not the new secondary y axis, as intended. How do I ensure the new points are plotted against the new y-axis scale, while maintaining the primary column plot as it is?
Thanks
data:
Site_No (x) = 1:10
Total_Solids (y) = 30, 35, 32, 50, 55, 57, 45, 49, 55, 46
TOC (y2) = 1.3, 1.5, 1.7, 1.45, 1.03, 2.4, 1.9, 1.8, 1.1, 1.6
Code:
ggplot(df) +
geom_col(aes(x = Site_No, y = Total_Solids)) +
geom_point(aes(x = Site_No, y = TOC)) +
scale_y_continuous(name = "Total Solids (g)",
sec.axis = sec_axis(~ ./20, name = "Total Organic Carbon (g)"))

It's not sufficient to "rescale the scale". You also have to rescale the data to be plotted on the secondary axis using the inverse of the rescaling factor applied to the scale:
df <- data.frame(
Site_No = c(1:10),
Total_Solids = c(30, 35, 32, 50, 55, 57, 45, 49, 55, 46),
TOC = c(1.3, 1.5, 1.7, 1.45, 1.03, 2.4, 1.9, 1.8, 1.1, 1.6)
)
library(ggplot2)
ggplot(df) +
geom_col(aes(x = Site_No, y = Total_Solids)) +
geom_point(aes(x = Site_No, y = TOC * 20)) +
scale_y_continuous(name = "Total Solids (g)",
sec.axis = sec_axis(~ ./20, name = "Total Organic Carbon (g)"))

Related

Plotting unequal error bars as bubbles on a scatterplot in ggplot2

I have a set of 10 density estimates, obtained from 5 sites using two differnt methods (REM and DS). Each density estimate has their respective confidence intervals, which are unequal.
I want a scatter plot with the x-axis showing the density from REM and the y-axis showing the density estimate from DS. I then want to a bubble around each point, representing the confidence intervals.
At the moment I can only seem to set specific height and width values for these confidence intervals, which would be fine if they were even. Since they are uneven, the bubbles will not be circles but should be more of an egg-shaped ellipse, off-centre from the point estimate.
This is the code I've used, in which you can see the respective confidence intervals. The plot shows what this makes, if the confidence intervals were event. How would I adapt this to make the confidence intervals uneven?
Thank you.
# sample data
df <- data.frame(site=c(1, 2, 3, 4, 5),
rem=c(17.7, 14.1, 10.6, 13.2, 1.0),
rem_lower=c(8.2, 6.6, 4.2, 3.2, 0.2),
rem_upper=c(27.1, 21.5, 17.0, 23.1, 1.7),
ds=c(16.6, 18.5, 5.2, 21.8, 2.4),
ds_lower=c(6.3, 5.1, 2.7, 4.5, 0.5),
ds_upper=c(40.4, 39.9, 10.9, 44.7, 8.3))
# calculate the width and height of each ellipse
width <- df$rem_upper - df$rem_lower
height <- df$ds_upper - df$ds_lower
# plot the data with ellipses
ggplot(df, aes(x = rem, y = ds, color = factor(site))) +
geom_point(size = 5) +
geom_ellipse(aes(x0 = rem, y0 = ds, a = width, b = height, fill = factor(site),
angle = 45), alpha = 0.3) +
scale_fill_manual(values = c("#1f78b4", "#33a02c", "#e31a1c", "#ff7f00", "#6a3d9a")) +
labs(x = "rem", y = "DS") +
theme_classic()

Add categorical grouping to scatter plot of continuous data in R?

Sorry if image 1 is a little basic - layout sent by my project supervisor! I have created a scatterplot of total grey seal abundance (Total) over observation time (Obsv_time), and fitted a gam over the top, as seen in image 2:
plot(Total ~ Obsv_time,
data = R_Count,
ylab = "Total",
xlab = "Observation Time (Days)",
pch = 20, cex = 1, bty = "l",col="dark grey")
lines(R_Count$Obsv_time, fitted(gam.tot2))
I would like to somehow show on the graph the corresponding Season (Image 1) - from a categorical factor variable (4 levels: Pre-breeding,Breeding,Post-breeding,Moulting), which corresponds to Obsv_time.
I am unsure if I need to plot a secondary axis or just add labels to the graph...and how to do each! Thanks!
Wanted graph layout - indicate season from factor variable
Scatterplot with GAM curve
You can do this with base R graphics. Leave off the x-axis in the original plot, and add an axis with the season labels separately. You can get indicate the season by overlaying polygons.
## Some bogus data
x = sort(runif(50,0,250))
y = 800*(sin(x/40) + x/100 + rnorm(50,0, 0.2)) + 500
FittedY = 800*(sin(x/40) + x/100)+500
plot(x,y, pch= 20, col='lightgray', ylim=c(300,2700), xaxt='n',
xlab="", ylab='Total')
lines(x, FittedY)
axis(1, at=c(25,95,155,215), tick=FALSE,
labels=c('PreBreed', 'Repro', 'PostBreed', 'Moulting'))
rect(c(-10,65,125,185), 0, c(65,125,185,260), 3000,
col=rainbow(4, alpha=0.05), border=NA)
If you are able to use ggplot2, you could add (or compute from time) another factor variable to your data-frame which would be your season. Then it is just a matter of using color (or any other) aesthetic which would use this season variable.
require(ggplot2)
df <- data.frame(total = c(26, 41, 31, 75, 64, 32, 7, 89),
time = c(1, 2, 3, 4, 5, 6, 7, 8))
df$season <- cut(df$time, breaks=c(0, 2, 4, 6, 8),
labels=c("winter", "spring", "summer", "autumn"))
ggplot(df, aes(x=time, y=total)) +
geom_smooth(color="black") +
geom_point(aes(color=season))

How to show plotted data with big value differences?

I have the data car_crashes that I am plotting using ggplot. It has 3 different data sets as seen below
but since Average of Cars is huge, the other values do not show even bit because they are in the range of 100. If I remove the average of cars data, the plot actually looks like this
Is there a way I can show all the data in one plot so that at least I can see the num of crashes plot?
The code I used is below:
carcrashes_figure <- ggplot()+geom_area(aes(YEAR_WW,AverageofCars,group = 1,colour = 'Average of cars'),car_crashes,fill = "dodgerblue1",alpha = 0.4)+
geom_line(aes(YEAR_WW,averageofcars,group = 1,linetype ='num of crashes'),car_crashes,fill = "dodgerblue3",colour = "dodgerblue3",size = 1.6) +
geom_line(aes(car_crashes$YEAR_WW,constantline,group = 1, size = 'constant line' ),car_crashes1,fill = "green4",colour = "green4")+
theme_bw() +
theme(axis.text.x = element_text(angle=70, vjust=0.6, face = 'bold'))+
theme(axis.text.y = element_text(angle=0, vjust=0.2, face = 'bold'))+
scale_colour_manual('', values = "dodgerblue1")+
scale_size_manual('',values = 1.4)+
scale_linetype_manual('',values = 1)+
scale_y_continuous()+
theme(legend.text = element_text(size = 8, colour = "black", angle = 0))
carcrashes_figure
I agreed the idea, using a separate y-axis by #Jim Quirk. As far as I know, ggplot2 isn't very good at doing it, so I used basic plot.
# making example ts_data
set.seed(1); data <- matrix(c(rnorm(21, 1000, 100), rnorm(21, 53, 10), rep(53, 21)), ncol=3)
ts_data <- ts(data, start = 1980, frequency = 1)
par(mar=c(4, 4.2, 1.5, 4.2)) # enlarge a right margin
# plot(ts_data[,1]) # check y-range
plot(ts_data[,2:3], plot.type = "single", ylab="num of crashes & constant line",
col=c(2,3), ylim=c(35,100), lwd=2) # draw "num of crashes" and "constant line"
par(usr = c(par("usr")[1:2], 490, 1310)) # set the second y coordinates
axis(4) # write it on the right side
polygon(x = c(1980:2000, rev(1980:2000)), y = c(ts_data[,1], rep(0,21)),
col="#0000FF20", border = "blue") # paint "Average of cars"
mtext(side=4, "Average of cars", line=2.5)
legend("topright",paste(c("num of crashes","constant line","Average of cars")),
pt.cex=c(0,0,3), lty=c(1,1,0), pch=15, cex=0.9, col=c(2, 3, "#0000FF20"), bty="n",
inset=c(0.02,-0.02), y.intersp=1.5)

Animate ggplot2 stacked line chart in R

I'm trying to animate a stacked line chart in ggplot2.
Here's the plot I'd like to animate:
Here's the code to generate a similar plot:
#Data
mydata <- data.frame(year=rep(1:6, times=4),
activity=as.factor(rep(c("research","coursework","clinical work","teaching"), each=6)),
time=c(40, 35, 40, 60, 85, 90,
50, 40, 10, 0, 5, 0,
5, 20, 20, 40, 10, 10,
5, 5, 30, 0, 0, 0))
mydata$activity <- ordered(mydata$activity, levels = c("research","clinical work","coursework","teaching"))
labels <- data.frame(activity=c("research","coursework","clinical work","teaching"),
xaxis=c(5, 1.8, 2.5, 2.97),
yaxis=c(25, 70, 48, 90))
#Plot
ggplot(mydata, aes(x=year, y=time, fill=activity)) +
geom_area(stat="smooth", span=.35, color="black") +
theme(legend.position = "none") +
geom_text(data=labels, aes(x=xaxis, y=yaxis, label=activity)) +
ggtitle("Time in Different Activities by Year in Program") +
ylab("Percentage of Time") +
xlab("Year in Program")
I'm looking for the first image to display all axes and text. The second iteration, I'd like to gradually reveal over time, from left to right, the "Research" stacked line (including color and border). The third iteration, I'd like to gradually reveal, from left to right, the "Clinical Work" stacked line. Fourth, the "Coursework" stacked line. And finally, the "Teaching" stacked line.
Ideally, the output format would be very smooth (no jagged jumps) and would be compatible with PowerPoint.
Here is an R-based solution. It saves individual figures (.png) that can be iterated through within a presentation.
Alternatively,you could create an animation (for example converting to .gif) using ImageMagick http://www.imagemagick.org/
#Data
mydata <- data.frame(year=rep(1:6, times=4),
activity=as.factor(rep(c("research","coursework","clinical work","teaching"), each=6)),
time=c(40, 35, 40, 60, 85, 90,
50, 40, 10, 0, 5, 0,
5, 20, 20, 40, 10, 10,
5, 5, 30, 0, 0, 0))
#order the activities and then the dataframe
mydata$activity <- ordered(mydata$activity, levels = c("research","clinical work","coursework","teaching"))
mydata <- mydata[order(mydata$activity),]
#labels
labels <- data.frame(activity=c("research","coursework","clinical work","teaching"),
xaxis=c(5, 1.8, 2.5, 2.97),
yaxis=c(25, 70, 48, 90))
#creates a function to draws a plot for each activity
draw.stacks<-function(leg){
int <- leg*6
a<-ggplot(data=mydata[1:int,], aes(x=year, y=time, fill=activity))+
geom_area(stat="smooth", span=.35, color="black") +
theme_bw()+
scale_fill_discrete(limits = c("research","clinical work","coursework","teaching"), guide="none")+
theme(panel.grid.major = element_blank(),
panel.grid.minor = element_blank()) +
coord_cartesian(xlim=c(1,6),ylim=c(0,100))+
geom_text(data=labels, aes(x=xaxis, y=yaxis, label=activity)) +
ggtitle("Time in Different Activities by Year in Program") +
ylab("Percentage of Time") +
xlab("Year in Program")
print(a)
}
# save individual png figures
for (i in 0:4) {
png(paste("activity", i, "png", sep="."))
draw.stacks(i)
dev.off()
}
Sorry for bringing in a non-programmer solution, but I would simply generate plots for each iteration separately, put them in power point (one plot on one slide), and use some fancy slide transition effects (I tried the Random Bars effect on your example, and it looked nice).
If you determined to find an R-based solution, you can take a look at the animate package (see a Strategic Zombie Simulation example here).

R: Bar plot on a continuous x-axis (time-scaled)

I'm fairly new to R so please comment on anything you see.
I have data taken at different timepoints, under two conditions (for one timpoint) and I want to plot this as a bar plot with errorbars and with the bars at the appropriate timepoint.
I currently have this (stolen from another question on this site):
library(ggplot2)
example <- data.frame(tp = factor(c(0, "14a", "14b", 24, 48, 72)), means = c(1, 2.1, 1.9, 1.8, 1.7, 1.2), std = c(0.3, 0.4, 0.2, 0.6, 0.2, 0.3))
ggplot(example, aes(x = tp, y = means)) +
geom_bar(position = position_dodge()) +
geom_errorbar(aes(ymin=means-std, ymax=means+std))
Now my timepoints are a factor, but the fact that there is an unequal distribution of measurements across time makes the plot less nice.!
This is how I imagine the graph :
I find the ggplot2 package can give you very nice graphs, but I have a lot more difficulty understanding it than I have with other R stuff.
Before we get into R, you have to realize that even in a bar plot the x axis needs a numeric value. If you treat them as factors then the software assumes equal spacing between the bars by default. What would be the x-values for each of the bars in this case? It can be (0, 14, 14, 24, 48, 72) but then it will plot two bars at point 14 which you don't seem to want. So you have to come up with the x-values.
Joran provides an elegant solution by modifying the width of the bars at position 14. Modifying the code given by joran to make the bars fall at the right position in the x-axis, the final solution is:
library(ggplot2)
example <- data.frame(tp = factor(c(0, "14a", "14b", 24, 48, 72)), means = c(1, 2.1, 1.9, 1.8, 1.7, 1.2), std = c(0.3, 0.4, 0.2, 0.6, 0.2, 0.3))
example$tp1 <- gsub("a|b","",example$tp)
example$grp <- c('a','a','b','a','a','a')
example$tp2 <- as.numeric(example$tp1)
ggplot(example, aes(x = tp2, y = means,fill = grp)) +
geom_bar(position = "dodge",stat = "identity") +
geom_errorbar(aes(ymin=means-std, ymax=means+std),position = "dodge")

Resources