transforming axis labels with a multiplier ggplot2 - r

Previously in ggplot2, I used a formatter function to multiply values in the Y axis by 100:
formatter100 <- function(x){
x*100 }
With the new ggplot2 (v0.9.1), I am having trouble converting axis labels with a new transformation function:
mult_trans <- function() {
trans_new("mult", function(x) 100*x, function(x) x/100) }
Here is the example plot function
library(scales)
test<-data.frame(ecdf=c(0.02040816,0.04081633,0.06122449,0.08163265,0.10204082,0.14285714,0.14285714,0.16326531,0.24489796,0.24489796,0.24489796,0.24489796,0.26530612,0.28571429,0.30612245,0.32653061,0.36734694,0.36734694,0.38775510,0.40816327,0.42857143,0.46938776,0.46938776,0.48979592,0.53061224,0.53061224,0.59183673,0.59183673,0.59183673,0.61224490,0.63265306,0.65306122,0.67346939,0.69387755,0.71428571,0.73469388,0.75510204,0.77551020,0.79591837,0.81632653,0.83673469,0.85714286,0.87755102,0.89795918,0.91836735,0.93877551,0.95918367,0.97959184,0.99900000),lat=c(50.7812,66.4062,70.3125,97.6562,101.5620,105.4690,105.4690,109.3750,113.2810,113.2810,113.2810,113.2810,125.0000,136.7190,148.4380,164.0620,167.9690,167.9690,171.8750,175.7810,183.5940,187.5000,187.5000,191.4060,195.3120,195.3120,234.3750,234.3750,234.3750,238.2810,261.7190,312.5000,316.4060,324.2190,417.9690,507.8120,511.7190,562.5000,664.0620,683.5940,957.0310,1023.4400,1050.7800,1070.3100,1109.3800,1484.3800,1574.2200,1593.7500,1750.0000))
xbreaks<-c(50,100,150,200,300,500,1000,2000)
ybreaks<-c(1,2,5,10,20,30,40,50,60,70,80,90,95,98,99,99.5,99.9)/100
p <- ggplot( test, aes(lat, ecdf) )
p<-p +
geom_point()+
scale_x_log10(breaks=xbreaks, labels = comma(xbreaks))+
scale_y_continuous(trans='probit',
labels = trans_format(mult_trans()),
"cumulative probability %",
breaks=ybreaks)+
xlab("latency ms")
p
this returns the error:
Error in scale$labels(breaks) : could not find function "trans"
Looks like I've misunderstood how to use transforms properly.

Transformations actually transform the scale itself. In the case of your y axis, you're just formatting the breaks. So you don't really want to use trans_new, just a regular format function. Similarly, you want to use comma_format rather than comma, as the former actually returns a function as needed:
mult_format <- function() {
function(x) format(100*x,digits = 2)
}
p <- ggplot( test, aes(lat, ecdf) )
p<-p +
geom_point()+
scale_x_log10(breaks = xbreaks,labels = comma_format()) +
scale_y_continuous(trans='probit',
labels = mult_format(),
breaks=ybreaks) +
xlab("latency ms")
p

I had multiplied the values of the y axis by 6 but I wanted to display the unmultiplied values as labels, so I ended up adding this:
scale_y_continuous(breaks=seq(-100,100,2)*6,labels=seq(-100,100,2))

Related

For loops for ggplot2 not returning y axis labels

I am trying to plot multiple y values vs a single x value. I want the graphs to be in seperate figures. The following code is working to generate the graphs, but the y axes are labeled "taxonomy_metadata_combined_p21[,i]" and I want them to have the same labels as the column titles. I have tried multiple different things but how can I change the y axes label?
for(i in 2:ncol(taxonomy_metadata_combined_p21)) {
print(ggplot(taxonomy_metadata_combined_p21, aes(x = Txt_Sex, y = taxonomy_metadata_combined_p21[ , i])) +
geom_boxplot())+
ylab(colnames(taxonomy_metadata_combined_p21)[i])
}
Your ylab() was added after the print() function. If you print after adding ylab() it should work.
As a sidenote; it is not recommended to use y = taxonomy_metadata_combined_p21[ , i] as an aesthetic. The ggplot2 authors instead recommend to use the .data pronoun if the column name is known.
Reprex with built-in data:
library(ggplot2)
df <- rev(iris)
for (i in 2:ncol(df)) {
p <- ggplot(df, aes(x = Species, y = .data[[colnames(df)[i]]])) +
geom_boxplot() +
ylab(colnames(df)[i])
print(p)
}

How to add multiple geoms to ggplot using a list (or for loop)?

My question is, is it possible to dynamically create a list of geoms, which I can add to a ggplot, enabling me to plot several, seperate series of data all at once?
Reproducible Example
The following code demonstrates my question:
library(ggplot2)
# Function to generate fake data
generate_fake_results = function(){
results = list()
for(i in c(1:10)){
x = c((1+10*i):(10+10*i))
results = append(results, list(data.frame(
x = as.Date("2000-01-01") + x,
y = sin(x),
ylower1 = sin(x) - 0.25,
ylower2 = sin(x) - 0.5,
yupper1 = sin(x) + 0.25,
yupper2 = sin(x) + 0.50
)
)
)
}
return(results)
}
fake_data = generate_fake_results()
# Function to plot the mean, upper and lower bounds of a model
# The dataset contains two upper and lower bounds; the 80% and 95% confidence interval
predict_margin_func = function(r, color='blue', alpha=0.1){
return(
list(
geom_ribbon(aes(x=as.Date(r$x,"%Y-%m-%d"),
ymin=r$ylower1,
ymax=r$yupper1), fill=color, alpha=alpha),
geom_ribbon(aes(x=as.Date(r$x,"%Y-%m-%d"),
ymin=r$ylower2,
ymax=r$yupper2), fill=color, alpha=alpha),
geom_line(aes(x=as.Date(r$x,"%Y-%m-%d"), y=r$y), size=1.25, color=color)
)
)
}
# This plots the graph that I want, but... I have to manually add each forecast
# from my fake_data list "manually"
ggplot() +
predict_margin_func(fake_data[[1]]) +
predict_margin_func(fake_data[[2]]) +
predict_margin_func(fake_data[[3]]) +
predict_margin_func(fake_data[[4]]) +
predict_margin_func(fake_data[[5]])
# I'd rather use a for loop to do this dynamically, but I can't get it to work.
# If I do this, it doesn't work:
plot_list = list()
for(i in c(1:length(fake_data))){
plot_list = append(plot_list, predict_margin_func(fake_data[[i]]))
}
ggplot() +
plot_list
While solution 1 "works", I'd much rather use something like solution 2, where I don't have to manually add each series I want to plot, as this is more easily extendible if the number of forecasts in the results list changes.
The results in plot_list seem to be 10 copies of the last result/the highest i from the for loop. I'm quessing R is doing some clever trick, which I don't want in this specific case, where the results in the list are instances/references of a thing, where I want "the thing that is being referenced too".
Does anyone have an idea what I could do here? I could maybe also reshape my data, but I wondered if it were possible to do using a list.
Up front: I can fix your for loop (see below), but I think a better solution is to do:
ggplot() + lapply(fake_data, predict_margin_func)
As to why your for loop is failing ...
ggplot tends to operate lazily, so the [[i]] is being realized at the time it is rendered, not when the geom is created. This is why you're only seeing the last of them. While R ggplot2 for loop plots same data appears to be a similar question, the proposed answers don't really apply. Why? Because you're hard-coding values into your geoms instead of assigning data= within them and using non-standard evaluation within the geom itself.
Here, I add data=r to each geom and remove their references to r$.
predict_margin_func2 = function(r, color='blue', alpha=0.1){
return(
list(
geom_ribbon(data = r, aes(x=as.Date(x,"%Y-%m-%d"),
ymin=ylower1,
ymax=yupper1), fill=color, alpha=alpha),
geom_ribbon(data = r, aes(x=as.Date(x,"%Y-%m-%d"),
ymin=ylower2,
ymax=yupper2), fill=color, alpha=alpha),
geom_line(data = r, aes(x=as.Date(x,"%Y-%m-%d"), y=y), size=1.25, color=color)
)
)
}
plot_list = list()
for(i in c(1:length(fake_data))){
plot_list = append(plot_list, predict_margin_func2(fake_data[[i]]))
}
ggplot() + plot_list
(produces the same plot as above).

ggplot2: display every nth value on discrete axis

How I can automate displaying only 1 in every n values on a discrete axis?
I can get every other value like this:
library(ggplot2)
my_breaks <- function(x, n = 2) {
return(x[c(TRUE, rep(FALSE, n - 1))])
}
ggplot(mpg, aes(x = class, y = cyl)) +
geom_point() +
scale_x_discrete(breaks = my_breaks)
But I don't think it's possible to specify the n parameter to my_breaks, is it?
Is this possible another way? I'm looking for a solution that works for both character and factor columns.
Not quite like that, but scale_x_discrete can take a function as the breaks argument, so you we just need to adapt your code to make it a functional (a function that returns a function) and things will work:
every_nth = function(n) {
return(function(x) {x[c(TRUE, rep(FALSE, n - 1))]})
}
ggplot(mpg, aes(x = class, y = cyl)) +
geom_point() +
scale_x_discrete(breaks = every_nth(n = 3))
Since ggplot 3.3.0 it is also possible to solve the problem of dense labels on discrete axis by using scale_x_discrete(guide = guide_axis(n.dodge = 2)), which gives (figure from documentation):
See the rewrite of axis code section of the release notes for more details.

ggplot secondary y axes showing z scores using sec_axis

ggplot2 now allows for adding a secondary y-axis if it is a one-to-one transformation of the primary axis.
For my graph, I would like to plot the original units on the left y-axis and z-scores on the right y-axis, but I am having trouble working out how to do this in practice.
The documentation suggests this secondary axes are added using the sec_axis() function e.g.,
scale_y_continuous(sec.axis = sec_axis(~.+10))
creates a second y-axis 10 units higher than the first.
Z-scores can be created in R using the scale() function. So I assumed I could do something like this to get a second y-axis displaying z-scores:
scale_y_continuous(sec.axis = sec_axis(scale(~.)))
However, this returns a "invalid first argument" error.
Does anyone have any ideas how to make this work?
You could use the z-score transformation formula. This works well:
library(tidyverse)
library(scales)
df <- data.frame(val = c(1:30), var = rnorm(30, 10,2))
p <- ggplot() + geom_line(data = df, aes( x = val, y = var))
p <- p + scale_y_continuous("variable", sec.axis = sec_axis(trans = ~./ sd(df$var) - mean(df$var)/ sd(df$var), "standarized variable"))
p
Or :
p + scale_y_continuous("variable", sec.axis = sec_axis(~ scale(.), "standarized variable"))

facet_wrap Title wrapping & Decimal places on free_y axis (ggplot2)

I have a set of code that produces multiple plots using facet_wrap:
ggplot(summ,aes(x=depth,y=expr,colour=bank,group=bank)) +
geom_errorbar(aes(ymin=expr-se,ymax=expr+se),lwd=0.4,width=0.3,position=pd) +
geom_line(aes(group=bank,linetype=bank),position=pd) +
geom_point(aes(group=bank,pch=bank),position=pd,size=2.5) +
scale_colour_manual(values=c("coral","cyan3", "blue")) +
facet_wrap(~gene,scales="free_y") +
theme_bw()
With the reference datasets, this code produces figures like this:
I am trying to accomplish two goals here:
Keep the auto scaling of the y axis, but make sure only 1 decimal place is displayed across all the plots. I have tried creating a new column of the rounded expr values, but it causes the error bars to not line up properly.
I would like to wrap the titles. I have tried changing the font size as in Change plot title sizes in a facet_wrap multiplot, but some of the gene names are too long and will end up being too small to read if I cram them on a single line. Is there a way to wrap the text, using code within the facet_wrap statement?
Probably cannot serve as definite answer, but here are some pointers regarding your questions:
Formatting the y-axis scale labels.
First, let's try the direct solution using format function. Here we format all y-axis scale labels to have 1 decimal value, after rounding it with round.
formatter <- function(...){
function(x) format(round(x, 1), ...)
}
mtcars2 <- mtcars
sp <- ggplot(mtcars2, aes(x = mpg, y = qsec)) + geom_point() + facet_wrap(~cyl, scales = "free_y")
sp <- sp + scale_y_continuous(labels = formatter(nsmall = 1))
The issue is, sometimes this approach is not practical. Take the leftmost plot from your figure, for example. Using the same formatting, all y-axis scale labels would be rounded up to -0.3, which is not preferable.
The other solution is to modify the breaks for each plot into a set of rounded values. But again, taking the leftmost plot of your figure as an example, it'll end up with just one label point, -0.3
Yet another solution is to format the labels into scientific form. For simplicity, you can modify the formatter function as follow:
formatter <- function(...){
function(x) format(x, ..., scientific = T, digit = 2)
}
Now you can have a uniform format for all of plots' y-axis. My suggestion, though, is to set the label with 2 decimal places after rounding.
Wrap facet titles
This can be done using labeller argument in facet_wrap.
# Modify cyl into factors
mtcars2$cyl <- c("Four Cylinder", "Six Cylinder", "Eight Cylinder")[match(mtcars2$cyl, c(4,6,8))]
# Redraw the graph
sp <- ggplot(mtcars2, aes(x = mpg, y = qsec)) + geom_point() +
facet_wrap(~cyl, scales = "free_y", labeller = labeller(cyl = label_wrap_gen(width = 10)))
sp <- sp + scale_y_continuous(labels = formatter(nsmall = 2))
It must be noted that the wrap function detects space to separate labels into lines. So, in your case, you might need to modify your variables.
This only solved the first part of the question. You can create a function to format your axis and use scale_y_continous to adjust it.
df <- data.frame(x=rnorm(11), y1=seq(2, 3, 0.1) + 10, y2=rnorm(11))
library(ggplot2)
library(reshape2)
df <- melt(df, 'x')
# Before
ggplot(df, aes(x=x, y=value)) + geom_point() +
facet_wrap(~ variable, scale="free")
# label function
f <- function(x){
format(round(x, 1), nsmall=1)
}
# After
ggplot(df, aes(x=x, y=value)) + geom_point() +
facet_wrap(~ variable, scale="free") +
scale_y_continuous(labels=f)
scale_*_continuous(..., labels = function(x) sprintf("%0.0f", x)) worked in my case.

Resources