I would like to create a graph that has superscripts on the axis instead of displaying unformatted numbers using ggplot2. I know that there are a lot of answers which change the axis label, but not the axis text. I am not trying to change the label of the graph, but the text on the axis.
Example:
x<-c('2^-5','2^-3','2^-1','2^1','2^2','2^3','2^5','2^7','2^9','2^11','2^13')
y<-c('2^-5','2^-3','2^-1','2^1','2^2','2^3','2^5','2^7','2^9','2^11','2^13')
df<-data.frame(x,y)
p<-ggplot()+
geom_point(data=df,aes(x=x,y=y),size=4)
p
So I would like the x-axis to display the same numbers but without the carrot.
EDIT:
A purely base approach:
df %>%
mutate_all(as.character)->new_df
res<-unlist(Map(function(x) eval(parse(text=x)),new_df$x))#replace with y for y
to_use<-unlist(lapply(res,as.expression))
split_text<-strsplit(gsub("\\^"," ",names(to_use))," ")
join_1<-as.numeric(sapply(split_text,"[[",1)) #tidyr::separate might help, less robust for numeric(I think)
join_2<-as.numeric(sapply(split_text,"[[",2))
to_use_1<-sapply(seq_along(join_1),function(x) parse(text=paste(join_1[x],"^",
join_2[x])))
The above can be reduced to less step, I posted the stepwise approach I took. The result for only x, the same can be done for y:
new_df %>%
ggplot()+
geom_point(aes(x=x,y=y),size=4)+
scale_x_discrete(breaks=df$x,labels=to_use_1)#replace with y and scale_y_discrete for y
Plot:
Original and erroneous answer:
I have deviated from standard tidyverse practice by using $, you can replace it with . and it might work although in this case it's not really important since the focus is on labels.:
library(dplyr)
df %>%
mutate(new_x=gsub("\\^"," ",x),
new_y=gsub("\\^"," ",y))->new_df
new_df %>%
ggplot()+
geom_point(aes(x=x,y=y),size=4)+
scale_x_discrete(breaks=x,labels=new_df$new_x)+
scale_y_discrete(breaks=y,labels=new_df$new_y)
This can be done with functions scale_x_log2 and scale_y_log2 that can be found in GitHub package jrnoldmisc.
First, install the package.
devtools::install_github("jrnold/rubbish")
Then, coerce the variables to numeric. I wil work with a copy of the original dataframe.
df1 <- df
df1[] <- lapply(df1, function(x){
x <- as.character(x)
sapply(x, function(.x)eval(parse(text = .x)))
})
Now, graph it.
library(jrnoldmisc)
library(ggplot2)
library(MASS)
library(scales)
a <- ggplot(df1, aes(x = x, y = y, size = 4)) +
geom_point(show.legend = FALSE) +
scale_x_log2(limits = c(0.01, NA),
labels = trans_format("log2", math_format(2^.x)),
breaks = trans_breaks("log2", function(x) 2^x, n = 10)) +
scale_y_log2(limits = c(0.01, NA),
labels = trans_format("log2", math_format(2^.x)),
breaks = trans_breaks("log2", function(x) 2^x, n = 10))
a + annotation_logticks(base = 2)
Edit.
Following the discussion in the comments, here are the two other ways that were seen to give different axis labels.
Axis labels every tick mark. Set limits = c(1.01, NA) and function argument n = 11, an odd number.
Axis labels on odd number exponents. Keep limits = c(0.01, NA), change to function(x) 2^(x - 1), n = 11.
Just the instructions, no plots.
The first.
a <- ggplot(df1, aes(x = x, y = y, size = 4)) +
geom_point(show.legend = FALSE) +
scale_x_log2(limits = c(1.01, NA),
labels = trans_format("log2", math_format(2^.x)),
breaks = trans_breaks("log2", function(x) 2^(x), n = 11)) +
scale_y_log2(limits = c(1.01, NA),
labels = trans_format("log2", math_format(2^.x)),
breaks = trans_breaks("log2", function(x) 2^(x), n = 11))
a + annotation_logticks(base = 2)
And the second.
a <- ggplot(df1, aes(x = x, y = y, size = 4)) +
geom_point(show.legend = FALSE) +
scale_x_log2(limits = c(0.01, NA),
labels = trans_format("log2", math_format(2^.x)),
breaks = trans_breaks("log2", function(x) 2^(x - 1), n = 11)) +
scale_y_log2(limits = c(0.01, NA),
labels = trans_format("log2", math_format(2^.x)),
breaks = trans_breaks("log2", function(x) 2^(x - 1), n = 11))
a + annotation_logticks(base = 2)
You can provide a function to the labels argument of the scale_x_*** and scale_y_*** functions to generate labels with superscripts (or other formatting). See examples below.
library(jrnoldmisc)
library(ggplot2)
df<-data.frame(x=2^seq(-5,5,2),
y=2^seq(-5,5,2))
ggplot(df) +
geom_point(aes(x=x,y=y),size=2) +
scale_x_log2(breaks=2^seq(-5,5,2),
labels=function(x) parse(text=paste("2^",round(log2(x),2))))
ggplot(df) +
geom_point(aes(x=x,y=y),size=2) +
scale_x_continuous(breaks=c(2^-5, 2^seq(1,5,2)),
labels=function(x) parse(text=paste("2^",round(log2(x),2))))
ggplot(df) +
geom_point(aes(x=x,y=y),size=2) +
scale_x_log10(breaks=10^seq(-1,1,1),
labels=function(x) parse(text=paste("10^",round(log10(x),2))))
Related
Transforming ggplot2 axes to log10 using scales::trans_breaks() can sometimes (if the range is small enough) produce un-pretty breaks, at non-integer powers of ten.
Is there a general purpose way of setting these breaks to occur only at 10^x, where x are all integers, and, ideally, consecutive (e.g. 10^1, 10^2, 10^3)?
Here's an example of what I mean.
library(ggplot2)
# dummy data
df <- data.frame(fct = rep(c("A", "B", "C"), each = 3),
x = rep(1:3, 3),
y = 10^seq(from = -4, to = 1, length.out = 9))
p <- ggplot(df, aes(x, y)) +
geom_point() +
facet_wrap(~ fct, scales = "free_y") # faceted to try and emphasise that it's general purpose, rather than specific to a particular axis range
The unwanted result -- y-axis breaks are at non-integer powers of ten (e.g. 10^2.8)
p + scale_y_log10(
breaks = scales::trans_breaks("log10", function(x) 10^x),
labels = scales::trans_format("log10", scales::math_format(10^.x))
)
I can achieve the desired result for this particular example by adjusting the n argument to scales::trans_breaks(), as below. But this is not a general purpose solution, of the kind that could be applied without needing to adjust anything on a case-by-case basis.
p + scale_y_log10(
breaks = scales::trans_breaks("log10", function(x) 10^x, n = 1),
labels = scales::trans_format("log10", scales::math_format(10^.x))
)
Should add that I'm not wed to using scales::trans_breaks(), it's just that I've found it's the function that gets me closest to what I'm after.
Any help would be much appreciated, thank you!
Here is an approach that at the core has the following function.
breaks = function(x) {
brks <- extended_breaks(Q = c(1, 5))(log10(x))
10^(brks[brks %% 1 == 0])
}
It gives extended_breaks() a narrow set of 'nice numbers' and then filters out non-integers.
This gives us the following for you example case:
library(ggplot2)
library(scales)
#> Warning: package 'scales' was built under R version 4.0.3
# dummy data
df <- data.frame(fct = rep(c("A", "B", "C"), each = 3),
x = rep(1:3, 3),
y = 10^seq(from = -4, to = 1, length.out = 9))
ggplot(df, aes(x, y)) +
geom_point() +
facet_wrap(~ fct, scales = "free_y") +
scale_y_continuous(
trans = "log10",
breaks = function(x) {
brks <- extended_breaks(Q = c(1, 5))(log10(x))
10^(brks[brks %% 1 == 0])
},
labels = math_format(format = log10)
)
Created on 2021-01-19 by the reprex package (v0.3.0)
I haven't tested this on many other ranges that might be difficult, but it should generalise better than setting the number of desired breaks to 1. Difficult ranges might be those just in between -but not including- powers of 10. For example 11-99 or 101-999.
I'd like to plot histogram and density on the same plot. What I would like to add to the following is custom y-axis label which would be something like sprintf("[%s] %s", ..density.., ..count..) - two numbers at one tick value. Is it possible to obtain this with scale_y_continuous or do I need to work this around somehow?
Below current progress using scales::trans_new and sec_axis. sec_axis is kind of acceptable but the most desirable output is as on the image below.
set.seed(1)
var <- rnorm(4000)
binwidth <- 2 * IQR(var) / length(var) ^ (1 / 3)
count_and_proportion_label <- function(x) {
sprintf("%s [%.2f%%]", x, x/sum(x) * 100)
}
ggplot(data = data.frame(var = var), aes(x = var, y = ..count..)) +
geom_histogram(binwidth = binwidth) +
geom_density(aes(y = ..count.. * binwidth)) +
scale_y_continuous(
# this way
trans = trans_new(name = "count_and_proportion",
format = count_and_proportion_label,
transform = function(x) x,
inverse = function(x) x),
# or this way
sec.axis = sec_axis(trans = ~./sum(.),
labels = percent,
name = "proportion (in %)")
)
I've tried to create object with breaks before basing on the graphics::hist output - but these two histogram differs.
bins <- (max(var) - min(var))/binwidth
hdata <- hist(var, breaks = bins, right = FALSE)
# hist generates different bins than `ggplot2`
At the end I would like to get something like this:
Would it be acceptable to add percentage as a secondary axis? E.g.
your_plot + scale_y_continuous(sec.axis = sec_axis(~.*2, name = "[%]"))
Perhaps it would be possible to overlay the secondary axis on the primary one, but I'm not sure how you would go about doing that.
You can achieve your desired output by creating a custom set of labels, and adding it to the plot:
library(tidyverse)
library(ggplot2)
set.seed(1)
var <- rnorm(400)
bins <- .1
df <- data.frame(yvals = seq(0, 20, 5), labels = c("[0%]", "[10%]", "[20%]", "[30%]", "[40%]"))
df <- df %>% tidyr::unite("custom_labels", labels, yvals, sep = " ", remove = TRUE)
ggplot(data = data.frame(var = var), aes(x = var, y = ..count..)) +
geom_histogram(aes(y = ..count..), binwidth = bins) +
geom_density(aes(y = ..count.. * bins), color = "black", alpha = 0.7) +
ylab("[density] count") +
scale_y_continuous(breaks = seq(0, 20, 5), labels = df$custom_labels)
I would like to make a barplot in R, where the last bar in the graph indicates that last is the sum of all values whose the frequency is greater than a certain threshold. I want to represent this information on x-value correspondent to the last bar. For instance:
library(ggplot2)
x <- c(1, 2, 3, 4, 5)
y <- c(4000, 3000, 2000, 1000, 500)
df <- data.frame(x, y)
names(df) <- c("Var1", "Freq")
theme_set(theme_classic())
g <- ggplot(df, aes(Var1, Freq))
g + geom_bar(stat = "identity", width = 0.5, fill = 'tomato2') +
xlab('Var1') +
ylab('Freq') +
theme(axis.text.x = element_text(angle = 0,
vjust = 0.6,
colour = "black"),
axis.text.y = element_text(colour = "black"))
The above code produces a chart similar to this:
But on the last bar, I want that last value of x-axis (x = 5) be displayed as >= 5.
So far, I've tried to use scale_x_discrete. So I added to the above code the following lines:
n <- 5
# I'm not very creative with names.
.foo <- function(x, n) {
if (x == n) {
element <- paste('\u2265', toString(x), sep = ' ')
} else {
element <- toString(x)
}
}
labels <- sapply(seq(n), .foo, n)
g + scale_x_discrete(breaks = sapply(seq(n), function(x) toString(x)),
labels = labels)
This code formats the x-axis as I wish but it overrides the barplot, leaving an empty chart:
How can I do this?
Change the labels in scale_x_continuous:
... + scale_x_continuous(labels=c("0", "1", "2", "3", "4", "\u2265 5"))
One approach would be to avoid changing the axis tick labels directly, but convert your categorical data in Var1 to a factor, then relevel that factor using forcats::fct_lump such that the final factor is ≥5
# Insert after df generated, before plot call
library(forcats)
df <- df %>%
mutate(Var1 = as_factor(Var1),
Var1 = fct_lump_min(Var1, min = 501, w = Freq, other_level = "≥5"))
The problem was that, as pointed out by #Z.Lin comment, I was assign the geom_bar(...) to the ggplot object before using scale_x_discret. Here is the solution:
library(ggplot2)
...
labels <- sapply(seq(n), .foo, n)
g <- ggplot(df, aes(Var1, Freq)) +
scale_x_discrete(breaks = sapply(seq(n), function(x) toString(x)),
labels = labels)
g + geom_bar(stat = "identity", width = 0.5, fill = color) +
...
I have the following function:
gg.barplots <- function(inp, order, xlab.strg, ylab.strg) {
require(RColorBrewer)
require(ggplot2)
require(reshape2)
arg <- c(expression(hat(p)[M]), expression(hat(p)[C]))
p <- order
col <- c(colorRampPalette(brewer.pal(9,'Blues')[2:9])(p+2),
colorRampPalette(brewer.pal(9,'Oranges')[2:9])(p+2))
lab <- c(0:p, paste(">",p,sep=""))
freq.mat <- data.frame(labels = lab, inp)
names(freq.mat) <- c("x", "Magnitude-only", "Complex-valued")
freq.mat$x <- factor(freq.mat$x, levels = c(levels(freq.mat$x)[-1],levels(freq.mat$x)[1]))
## force the orders to be as we want them to appear, using the factor function with levels specified.
freq.df <- melt(data = freq.mat, id.vars = 1, measure.vars = 2:3)
fill.vars <- paste(rep(names(freq.mat)[-1], times = p), rep(freq.mat$x, each = 2), sep = ":")
fill.vars <- factor(fill.vars, levels = fill.vars)
freq.df <- data.frame(fill.vars, freq.df[rep(c(0,p+2), times = p + 2) + rep(1:(p + 2), each = 2), ])
ggplot(data=freq.df, aes(x = x, y = value, fill = fill.vars)) +
geom_bar(stat="identity", position=position_dodge(), colour = "black") +
scale_fill_manual(values = col[rep(c(0,p+2), times = p + 2) + rep(1:(p + 2), each = 2)]) +
theme_bw() +
xlab(arg) +
ylab(ylab.strg) +
xlab(xlab.strg) +
ylab(ylab.strg)
}
which gives me the following (two dodged barplots) as in the following example:
dput(out.AR2$AR.rate)
structure(c(0.25178, 0.06735, 0.64564, 0.03523, 0.04396, 0.0027,
0.90415, 0.04919), .Dim = c(4L, 2L), .Dimnames = list(c("0",
"1", "2", ">2"), NULL))
and calling the function:
gg.barplots(inp = out.AR2$AR.rate, order = 2, xlab.strg = "AR order", ylab.strg = "Proportions")
which results in the following figure:
Now I feel that (even ignoring the inherent ugliness of the current legend in this plot), the whole legend is not necessary. I think it is enought to have only the colors (say the mid-valye of the Oranges scale and the mid-value of the Blues scale) should be enough to represent the important parts of the plot. The remainder (AR orders in the legend) are already there in the figure.
My question: is how do I make a legend which has only these two colors (and the words Complex-value and Magnitude-only) associated with them? I have tried several things and I am a bit lost, sorry.
Your function is a little messy - you could probably split it into two functions, one to clean and one to plot.
Anyways, the easiest way to get what you want is to use the breaks argument to scale_fill_manual. This allows you to choose only those levels you want in the legend:
gg.barplots <- function(inp, order, xlab.strg, ylab.strg) {
require(RColorBrewer)
require(ggplot2)
require(reshape2)
arg <- c(expression(hat(p)[M]), expression(hat(p)[C]))
p <- order
col <- c(colorRampPalette(brewer.pal(9,'Blues')[2:9])(p+2),
colorRampPalette(brewer.pal(9,'Oranges')[2:9])(p+2))
lab <- c(0:p, paste(">",p,sep=""))
freq.mat <- data.frame(labels = lab, inp)
names(freq.mat) <- c("x", "Magnitude-only", "Complex-valued")
freq.mat$x <- factor(freq.mat$x, levels = c(levels(freq.mat$x)[-1],levels(freq.mat$x)[1]))
## force the orders to be as we want them to appear, using the factor function with levels specified.
freq.df <- melt(data = freq.mat, id.vars = 1, measure.vars = 2:3)
fill.vars <- paste(rep(names(freq.mat)[-1], times = p), rep(freq.mat$x, each = 2), sep = ":")
fill.vars <- factor(fill.vars, levels = fill.vars)
freq.df <- data.frame(fill.vars, freq.df[rep(c(0,p+2), times = p + 2) + rep(1:(p + 2), each = 2), ])
ggplot(data=freq.df, aes(x = x, y = value, fill = fill.vars)) +
geom_bar(stat="identity", position=position_dodge(), colour = "black") +
scale_fill_manual(values = col[rep(c(0,p+2), times = p + 2) + rep(1:(p + 2), each = 2)], breaks = c("Magnitude-only:2", "Complex-valued:2")) +
theme_bw() +
xlab(arg) +
ylab(ylab.strg) +
xlab(xlab.strg) +
ylab(ylab.strg)
}
I have the following code:
library("ggplot2")
set.seed(12351234)
names <- factor(rep(paste("C", 1:10, sep = "_"), each = 10))
time <- rep(1:10, 10)
outcome <- rnorm(mean = 1e7, sd = 1e7, n = length(time))
outcome <-ifelse(outcome < 0, 0, outcome)
data.toy <- data.frame(names, time, outcome)
ggplot(data = data.toy, aes(y = outcome, x = time)) + geom_bar(stat = "identity", aes(fill = names)) + scale_x_continuous(breaks = unique(data.toy$time))
and it produces the following image: http://picpaste.com/data_toy-OR0jVHj5.png
I am wondering if there is a way to remove the horizontal "gray" space between the bars on the x-axis (the space that the arrows are pointing at). I suspect I am using this geom incorrectly as time is not categorical and there is a more appropriate geom for this.