Plotting mean and std. deriv. of logarithmic data in R

Plotting mean and std. deriv. of logarithmic data in R - r

I'd like to plot some data stored in two vectors (x and y) in loglog scale.
Furthermore, I want to add the mean and the standard derivation (latter using bars).
My problem is, that there are zeros in my y-data-vector and the "mean" function then gets log(0) (=-Inf) as an argument and also returns -Inf
qplot(x, y, log="xy") + stat_summary(fun.y=mean, geom="point")
How can I make the "mean" function work on the 'normal' data and not on the log'ed data?
Cheers,
Manuel

Calculate the stats before the transformation.
Ignoring the log scales for now, I think what you want to plot is something like this
p <- ggplot(dfr) +
geom_point(aes(x, y)) +
geom_point(
aes(
x = mean(x),
y = mean(y)
),
colour = "blue",
size = 5
) +
geom_rect(
aes(
xmin = mean(x) - sd(x),
xmax = mean(x) + sd(x),
ymin = mean(y) - sd(y),
ymax = mean(y) + sd(y)
),
alpha = 0.2
)
p
Now adding in the log scale is done as usual
p +
scale_x_log10() +
scale_y_log10()
Of course, you zeroes will not show on the graph, as they shouldn't. To deal with them, you have a choice between removing them from the dataset or substituting a small positive number.
EDIT: If you want stats for y values grouped by an x value, it sounds like your x-variable is a factor, in which case you probably want a barchart. Log y scales for barcharts are a bad idea, but you could possibly justify a square root transformation instead.

Read the help page for coord_trans. Using coord_trans(xtrans = 'log10', ytrans = 'log10') would help you create a log-log plot, since coordinate transformations occur after all statistics have been calculated.

Related

Problems with ggplot2 and geom_errorbar()

Greeting,
I'm having a hard time with ggplot2 and the geom_error function.
I have a data frame with individuals(rows) and size(column 1) and density(column2). My aim is to plot influence of density on size in a quadratic model.
lm(size ~ poly(density, 2, raw=TRUE))
for that matter I used.
ggplot(df, aes(x = density, y = size, col = Sexo)) +
geom_smooth(method = lm, formula = y ~ x + I(x^2), size = 1)+
geom_point())
It went fine. But now I want to plot the same data set with geom_errorbar. I tried.
ggplot(cg.cvic, aes(x = as.factor(density), y = size, col = sex)) +
geom_errorbar(ymin = size-sd, ymax = size + sd))
And I'm guettint the response:
Error in size - sd : non-numeric argument to binary operator
What am I doing wrong?

Firstly there is no column sd in your data frame. Moreover R has build in function sd which is a function not a variable or a number. So from R perspective you are trying to add variable to a function, so R tells you that one of the argument is non-numeric and your are trying to perform on him action which can only be perfomed on numbers. You have extract somehow the standard deviation of your model predictions, write it in your data frame and after that use it in ggplot. And don't name it sd, use something else.

ggplot boxplot on log scale, mean via stat_summary appears wrong [duplicate]

I have a bunch of measurements over time and I want to plot them in R. Here is a sample of my data. I've got 6 measurements for each of 4 time points:
values <- c (1012.0, 1644.9, 837.0, 1200.9, 1652.0, 981.5,
2236.9, 1697.5, 2087.7, 1500.8,
2789.3, 1502.9, 2051.3, 3070.7, 3105.4,
2692.5, 1488.5, 1978.1, 1925.4, 1524.3,
2772.0, 1355.3, 2632.4, 2600.1)
time <- factor (rep (c(0, 12, 24, 72), c(6, 6, 6, 6)))
The scale of these data is arbitrary, and in fact I'm going to normalize it so that the average of t=0 is 1.
norm <- values / mean (values[time == 0])
So far so good. Using ggplot, I plot both the individual points, as well as a line that goes through the average at each time point:
require (ggplot2)
p <- ggplot(data = data.frame(time, norm), mapping = aes (x = time, y = norm)) +
stat_summary (fun.y = mean, geom="line", mapping = aes (group = 1)) +
geom_point()
However, now I want to apply a logarithmic scale, and this is where my trouble starts. When I do:
q <- ggplot(data = data.frame(time, norm), mapping = aes (x = time, y = norm)) +
stat_summary (fun.y = mean, geom="line", mapping = aes (group = 1)) +
geom_point() +
scale_y_log2()
The line does NOT go through 0 at t=0, as you would expect because log (1) == 0. Instead the line crosses the y-axis slightly below 0. Apparently, ggplot applies the mean after log transformation, which gives a different result. I want it to take the mean before log transformation.
How can I tell ggplot to apply the mean first? Is there a better way to create this chart?

scale_y_log2() will do the transformation first and then calculate the geoms.
coord_trans() will do the opposite: calculate the geoms first, and the transform the axis.
So you need coord_trans(ytrans = "log2") instead of scale_y_log2()

A work around to solve it, if you don´t want to use coord_trans() and still want to transform the data, is to create a function which will back transform it:
f1 <- function(x) {
log10(mean(10 ^ x))
}
stat_summary (fun.y = f1, geom="line", mapping = aes (group = 1))

The best solution I found for this issue was to use a combo of coord_trans() and scale_y_continuous(breaks = breaks)
As previously suggested, using coord_trans will scale your axis without transforming the data, however it will leave you with an ugly axis.
Setting the limits in coord_trans works for some things, but if you want to fix your axis to have specific labels, you will then include scale_y_continuous with the breaks you'd like set.
coord_trans(y = 'log10') +
scale_y_continuous(breaks = breaks)

geom_tile with unequally spaced y values (e.g. 2^X)

I want to recreate an "image" plot in ggplot (because of some other aspects of the package). However, I'm facing a problem caused by my y-scale, which is defined by unequally but logically spaced values, e.g. I would have z values for y = 2,4,8,16,32. This causes the tiles to not be equally large, so I have these white bands in my figure. I can solve this by transforming the y values in a factor, but I don't want to do this because I'm also trying to plot other geom objects on the figure which require a numeric scale.
This clearifies my problem a bit:
# random data, with y scale numeric
d <- data.frame(Var1=rep(1901:2000,10),Var2=rep(c(2,4,8,16,32),each=100),value=rnorm(500,50,5))
line=data.frame(Var1=1901:2000,Var2=rnorm(50,1.5,0.5))
ggplot(d, aes(x=Var1, y=Var2)) +
geom_tile(aes(fill=value)) +
geom_line(data=line)
# y as factor
d2 = d
d2$Var2=as.factor(d2$Var2) ggplot(d2, aes(x=Var1, y=Var2)) +
geom_tile(aes(fill=value)) +
geom_line(data=line)
I tried attributing the line values to the value of the nearest factor level, but this introduces a big error. Also, I tried the size option in geom_tile, but this didn't work out either.
In the example the y data is log transformed, but this is just for the ease of making a fake dataset.
Thank you.

Something like this??
ggplot(d, aes(x=Var1, y=Var2)) +
geom_tile(aes(fill=value)) +
geom_line(data=line)+
scale_y_continuous(trans="log2")
Note the addition of scale_y_continuous(trans="log2")
EDIT Based on OP's comment below.
There is no built-in "reverse log2 transform", but it is possible to create new transformations using the trans_new(...) function in package scales. And, naturally, someone has already thought of this: ggplot2 reverse log coordinate transform. The code below is based on the link.
library(scales)
reverselog2_trans <- function(base = 2) {
trans <- function(x) -log(x, base)
inv <- function(x) base^(-x)
trans_new(paste0("reverselog-", format(base)), trans, inv, log_breaks(base = base), domain = c(1e-100, Inf))
}
ggplot(d, aes(x=Var1, y=Var2)) +
geom_tile(aes(fill=value)) +
geom_line(data=line)+
scale_y_continuous(trans="reverselog2")

Perhaps another approach using a discrete scale and facets might be a possibility:
d <- data.frame(Var1=rep(1901:2000,10),Var2=rep(c(2,4,8,16,32),each=100),value=rnorm(500,50,5), chart="tile" )
d$Var2 <- factor(d$Var2, levels=rev(unique(d$Var2)))
line <- data.frame(Var1=1901:2000,Var2=rnorm(50,1.5,0.5), chart="line")
ggplot(d, aes(x=Var1, y=Var2)) +
geom_tile(aes(y = Var2, fill=value) ) +
geom_line( data=line ) +
scale_y_discrete() +
facet_grid( chart ~ ., scale = "free_y", space="free_y")
which gives a chart like:

Transform only one axis to log10 scale with ggplot2

I have the following problem: I would like to visualize a discrete and a continuous variable on a boxplot in which the latter has a few extreme high values. This makes the boxplot meaningless (the points and even the "body" of the chart is too small), that is why I would like to show this on a log10 scale. I am aware that I could leave out the extreme values from the visualization, but I am not intended to.
Let's see a simple example with diamonds data:
m <- ggplot(diamonds, aes(y = price, x = color))
The problem is not serious here, but I hope you could imagine why I would like to see the values at a log10 scale. Let's try it:
m + geom_boxplot() + coord_trans(y = "log10")
As you can see the y axis is log10 scaled and looks fine but there is a problem with the x axis, which makes the plot very strange.
The problem do not occur with scale_log, but this is not an option for me, as I cannot use a custom formatter this way. E.g.:
m + geom_boxplot() + scale_y_log10()
My question: does anyone know a solution to plot the boxplot with log10 scale on y axis which labels could be freely formatted with a formatter function like in this thread?
Editing the question to help answerers based on answers and comments:
What I am really after: one log10 transformed axis (y) with not scientific labels. I would like to label it like dollar (formatter=dollar) or any custom format.
If I try #hadley's suggestion I get the following warnings:
> m + geom_boxplot() + scale_y_log10(formatter=dollar)
Warning messages:
1: In max(x) : no non-missing arguments to max; returning -Inf
2: In max(x) : no non-missing arguments to max; returning -Inf
3: In max(x) : no non-missing arguments to max; returning -Inf
With an unchanged y axis labels:

The simplest is to just give the 'trans' (formerly 'formatter') argument of either the scale_x_continuous or the scale_y_continuous the name of the desired log function:
library(ggplot2) # which formerly required pkg:plyr
m + geom_boxplot() + scale_y_continuous(trans='log10')
EDIT:
Or if you don't like that, then either of these appears to give different but useful results:
m <- ggplot(diamonds, aes(y = price, x = color), log="y")
m + geom_boxplot()
m <- ggplot(diamonds, aes(y = price, x = color), log10="y")
m + geom_boxplot()
EDIT2 & 3:
Further experiments (after discarding the one that attempted successfully to put "$" signs in front of logged values):
# Need a function that accepts an x argument
# wrap desired formatting around numeric result
fmtExpLg10 <- function(x) paste(plyr::round_any(10^x/1000, 0.01) , "K $", sep="")
ggplot(diamonds, aes(color, log10(price))) +
geom_boxplot() +
scale_y_continuous("Price, log10-scaling", trans = fmtExpLg10)
Note added mid 2017 in comment about package syntax change:
scale_y_continuous(formatter = 'log10') is now scale_y_continuous(trans = 'log10') (ggplot2 v2.2.1)

I had a similar problem and this scale worked for me like a charm:
breaks = 10**(1:10)
scale_y_log10(breaks = breaks, labels = comma(breaks))
as you want the intermediate levels, too (10^3.5), you need to tweak the formatting:
breaks = 10**(1:10 * 0.5)
m <- ggplot(diamonds, aes(y = price, x = color)) + geom_boxplot()
m + scale_y_log10(breaks = breaks, labels = comma(breaks, digits = 1))
After executing::

Another solution using scale_y_log10 with trans_breaks, trans_format and annotation_logticks()
library(ggplot2)
m <- ggplot(diamonds, aes(y = price, x = color))
m + geom_boxplot() +
scale_y_log10(
breaks = scales::trans_breaks("log10", function(x) 10^x),
labels = scales::trans_format("log10", scales::math_format(10^.x))
) +
theme_bw() +
annotation_logticks(sides = 'lr') +
theme(panel.grid.minor = element_blank())

I think I got it at last by doing some manual transformations with the data before visualization:
d <- diamonds
# computing logarithm of prices
d$price <- log10(d$price)
And work out a formatter to later compute 'back' the logarithmic data:
formatBack <- function(x) 10^x
# or with special formatter (here: "dollar")
formatBack <- function(x) paste(round(10^x, 2), "$", sep=' ')
And draw the plot with given formatter:
m <- ggplot(d, aes(y = price, x = color))
m + geom_boxplot() + scale_y_continuous(formatter='formatBack')
Sorry to the community to bother you with a question I could have solved before! The funny part is: I was working hard to make this plot work a month ago but did not succeed. After asking here, I got it.
Anyway, thanks to #DWin for motivation!

R ggplot2: using stat_summary (mean) and logarithmic scale

I have a bunch of measurements over time and I want to plot them in R. Here is a sample of my data. I've got 6 measurements for each of 4 time points:
values <- c (1012.0, 1644.9, 837.0, 1200.9, 1652.0, 981.5,
2236.9, 1697.5, 2087.7, 1500.8,
2789.3, 1502.9, 2051.3, 3070.7, 3105.4,
2692.5, 1488.5, 1978.1, 1925.4, 1524.3,
2772.0, 1355.3, 2632.4, 2600.1)
time <- factor (rep (c(0, 12, 24, 72), c(6, 6, 6, 6)))
The scale of these data is arbitrary, and in fact I'm going to normalize it so that the average of t=0 is 1.
norm <- values / mean (values[time == 0])
So far so good. Using ggplot, I plot both the individual points, as well as a line that goes through the average at each time point:
require (ggplot2)
p <- ggplot(data = data.frame(time, norm), mapping = aes (x = time, y = norm)) +
stat_summary (fun.y = mean, geom="line", mapping = aes (group = 1)) +
geom_point()
However, now I want to apply a logarithmic scale, and this is where my trouble starts. When I do:
q <- ggplot(data = data.frame(time, norm), mapping = aes (x = time, y = norm)) +
stat_summary (fun.y = mean, geom="line", mapping = aes (group = 1)) +
geom_point() +
scale_y_log2()
The line does NOT go through 0 at t=0, as you would expect because log (1) == 0. Instead the line crosses the y-axis slightly below 0. Apparently, ggplot applies the mean after log transformation, which gives a different result. I want it to take the mean before log transformation.
How can I tell ggplot to apply the mean first? Is there a better way to create this chart?

scale_y_log2() will do the transformation first and then calculate the geoms.
coord_trans() will do the opposite: calculate the geoms first, and the transform the axis.
So you need coord_trans(ytrans = "log2") instead of scale_y_log2()

A work around to solve it, if you don´t want to use coord_trans() and still want to transform the data, is to create a function which will back transform it:
f1 <- function(x) {
log10(mean(10 ^ x))
}
stat_summary (fun.y = f1, geom="line", mapping = aes (group = 1))

The best solution I found for this issue was to use a combo of coord_trans() and scale_y_continuous(breaks = breaks)
As previously suggested, using coord_trans will scale your axis without transforming the data, however it will leave you with an ugly axis.
Setting the limits in coord_trans works for some things, but if you want to fix your axis to have specific labels, you will then include scale_y_continuous with the breaks you'd like set.
coord_trans(y = 'log10') +
scale_y_continuous(breaks = breaks)

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

Plotting mean and std. deriv. of logarithmic data in R - r

Read the help page for coord_trans. Using coord_trans(xtrans = 'log10', ytrans = 'log10') would help you create a log-log plot, since coordinate transformations occur after all statistics have been calculated.

Related

Problems with ggplot2 and geom_errorbar()

ggplot boxplot on log scale, mean via stat_summary appears wrong [duplicate]

geom_tile with unequally spaced y values (e.g. 2^X)

Transform only one axis to log10 scale with ggplot2

R ggplot2: using stat_summary (mean) and logarithmic scale

Categories

Resources