Problems with ggplot2 and geom_errorbar() - r

Greeting,
I'm having a hard time with ggplot2 and the geom_error function.
I have a data frame with individuals(rows) and size(column 1) and density(column2). My aim is to plot influence of density on size in a quadratic model.
lm(size ~ poly(density, 2, raw=TRUE))
for that matter I used.
ggplot(df, aes(x = density, y = size, col = Sexo)) +
geom_smooth(method = lm, formula = y ~ x + I(x^2), size = 1)+
geom_point())
It went fine. But now I want to plot the same data set with geom_errorbar. I tried.
ggplot(cg.cvic, aes(x = as.factor(density), y = size, col = sex)) +
geom_errorbar(ymin = size-sd, ymax = size + sd))
And I'm guettint the response:
Error in size - sd : non-numeric argument to binary operator
What am I doing wrong?

Firstly there is no column sd in your data frame. Moreover R has build in function sd which is a function not a variable or a number. So from R perspective you are trying to add variable to a function, so R tells you that one of the argument is non-numeric and your are trying to perform on him action which can only be perfomed on numbers. You have extract somehow the standard deviation of your model predictions, write it in your data frame and after that use it in ggplot. And don't name it sd, use something else.

Related

Add a manually designed non-linear line in ggplot2?

I would like to add a non-linear model line to a graph in R, but instead of letting ggplot find the best fit, I just want to preset its parameters and thus be able to see how multiple manually designed models fit on top of the data. I tried the following:
ggplot(cars, aes(x = speed, y = dist)) +
geom_point() +
geom_smooth(method = "nls", method.args = list(formula = y ~ 0.76*exp(x*0.5), color = blue, data = data)
But got the error:
Computation failed in 'stat_smooth()':
formal argument "data" matched by multiple actual arguments
with slight adjustments, I also get the error 'what" must be a function or character string. Does anyone know if manually designating a line like this is possible? I could not find any other Stack Overflow post about this specific topic.
You might be looking for geom_function():
gg0 <- ggplot(cars, aes(x = speed, y = dist)) + geom_point()
gg0 + geom_function(fun = function(x) 0.76*exp(x*0.5), colour = "blue") +
coord_cartesian(ylim=c(0,100))
I added coord_cartesian because the specified function attains really large values for the upper end of the x-range of this graph ...

Contour plot or heatmap from three continuous variables

I have a model which has told me there is an interaction between two variables: a and b, which is significantly influencing my response variable: c. All three are continuous numeric variables. For detail c is the rate in change my response variable, b is the rate of change in my predictor and a is mean annual rainfall. The unit of analysis is pixels in a raster. So my model is telling me mean annual rainfall modifies how my predictor affects my response.
To visualise this interaction I would like to use a contour plot/heat map/level plot with a and b on the x and y axes and c providing the colour to show me how my response variable changes within the space described by a and b. I can do this with a scatter plot but its not very pretty or easy to interpret:
qplot(b, a, colour = c) +
scale_colour_gradient(low="green", high="red") +
When I try to plot a contour plot/heat map/level plot though all I get is errors, blank plots or ugly plots.
geom_contour gives me an error:
ggplot(data = Mod, aes(x = Rain, y = Bomas, z = Fire)) +
geom_contour()
Warning message:
Not possible to generate contour data
geom_raster initially gives me Error: cannot allocate vector of size 81567.2 Gb but when I round my data it produces:
ggplot(data = df, aes(x = a, y = b, z = c)) +
geom_raster(aes(fill = c))
Adding interpolate = TRUE to the geom_raster code just makes the lines a little blurry.
geom_tile produces a blank graph but with a scale bar for c:
ggplot(data = df, aes(x = a, y = b, z = c)) +
geom_tile(aes(color = c))
I've also tried using stat_density2d and setting the fill and/or the colour to c, but just got an error, and I've tried using levelplot in the lattice package as well but that produces this:
levelplot(c ~ a * b, data = df,
aspect = "asp", contour = TRUE,
xlab = "a",
ylab = "b")
I suspect the problems I'm encountering are because the functions are not set up to deal with continuous x and y variables, all the examples seem to use factors. I would have thought I could compensate for that by changing bin widths but that doesn't seem to work either. Is there a function that allows you to make a heat map with 3 continuous variables? Or do I need to treat my a and b variables as factors and manually make a dataframe with bins appropriate for my data?
If you want to experiment for yourself then you get similar problems to what I'm having with:
df<- as.data.frame(rnorm(1:1068))
df[,2] <- rnorm(1:1068)
df[,3] <- rnorm(1:1068)
names(df) <- c("a", "b", "c")
You can get automatic bins, and for example calculate the means by using stat_summary_2d:
ggplot(df, aes(a, b, z = c)) +
stat_summary_2d() +
geom_point(shape = 1, col = 'white') +
viridis::scale_fill_viridis()
Another good option is to slice your data by the third variable, and plot small multiples. This doesn't really show very well for random data though:
library(ggplot2)
ggplot(df, aes(a, b)) +
geom_point() +
facet_wrap(~cut_number(c, 4))

how to combine in ggplot line / points with special values?

I'm quite new to ggplot but I like the systematic way how you build your plots. Still, I'm struggeling to achieve desired results. I can replicate plots where you have categorical data. However, for my use I often need to fit a model to certain observations and then highlight them in a combined plot. With the usual plot function I would do:
library(splines)
set.seed(10)
x <- seq(-1,1,0.01)
y <- x^2
s <- interpSpline(x,y)
y <- y+rnorm(length(y),mean=0,sd=0.1)
plot(x,predict(s,x)$y,type="l",col="black",xlab="x",ylab="y")
points(x,y,col="red",pch=4)
points(0,0,col="blue",pch=1)
legend("top",legend=c("True Values","Model values","Special Value"),text.col=c("red","black","blue"),lty=c(NA,1,NA),pch=c(4,NA,1),col=c("red","black","blue"),cex = 0.7)
My biggest problem is how to build the data frame for ggplot which automatically then draws the legend? In this example, how would I translate this into ggplot to get a similar plot? Or is ggplot not made for this kind of plots?
Note this is just a toy example. Usually the model values are derived from a more complex model, just in case you wante to use a stat in ggplot.
The key part here is that you can map colors in aes by giving a string, which will produce a legend. In this case, there is no need to include the special value in the data.frame.
df <- data.frame(x = x, y = y, fit = predict(s, x)$y)
ggplot(df, aes(x, y)) +
geom_line(aes(y = fit, col = 'Model values')) +
geom_point(aes(col = 'True values')) +
geom_point(aes(col = 'Special value'), x = 0, y = 0) +
scale_color_manual(values = c('True values' = "red",
'Special value' = "blue",
'Model values' = "black"))

How to add gaussian curve to histogram created with qplot?

I have question probably similar to Fitting a density curve to a histogram in R. Using qplot I have created 7 histograms with this command:
(qplot(V1, data=data, binwidth=10, facets=V2~.)
For each slice, I would like to add a fitting gaussian curve. When I try to use lines() method, I get error:
Error in plot.xy(xy.coords(x, y), type = type, ...) :
plot.new has not been called yet
What is the command to do it correctly?
Have you tried stat_function?
+ stat_function(fun = dnorm)
You'll probably want to plot the histograms using aes(y = ..density..) in order to plot the density values rather than the counts.
A lot of useful information can be found in this question, including some advice on plotting different normal curves on different facets.
Here are some examples:
dat <- data.frame(x = c(rnorm(100),rnorm(100,2,0.5)),
a = rep(letters[1:2],each = 100))
Overlay a single normal density on each facet:
ggplot(data = dat,aes(x = x)) +
facet_wrap(~a) +
geom_histogram(aes(y = ..density..)) +
stat_function(fun = dnorm, colour = "red")
From the question I linked to, create a separate data frame with the different normal curves:
grid <- with(dat, seq(min(x), max(x), length = 100))
normaldens <- ddply(dat, "a", function(df) {
data.frame(
predicted = grid,
density = dnorm(grid, mean(df$x), sd(df$x))
)
})
And plot them separately using geom_line:
ggplot(data = dat,aes(x = x)) +
facet_wrap(~a) +
geom_histogram(aes(y = ..density..)) +
geom_line(data = normaldens, aes(x = predicted, y = density), colour = "red")
ggplot2 uses a different graphics paradigm than base graphics. (Although you can use grid graphics with it, the best way is to add a new stat_function layer to the plot. The ggplot2 code is the following.
Note that I couldn't get this to work using qplot, but the transition to ggplot is reasonably straighforward, the most important difference is that your data must be in data.frame format.
Also note the explicit mapping of the y aesthetic aes=aes(y=..density..)) - this is slighly unusual but takes the stat_function results and maps it to the data:
library(ggplot2)
data <- data.frame(V1 <- rnorm(700), V2=sample(LETTERS[1:7], 700, replace=TRUE))
ggplot(data, aes(x=V1)) +
stat_bin(aes(y=..density..)) +
stat_function(fun=dnorm) +
facet_grid(V2~.)

Plotting mean and std. deriv. of logarithmic data in R

I'd like to plot some data stored in two vectors (x and y) in loglog scale.
Furthermore, I want to add the mean and the standard derivation (latter using bars).
My problem is, that there are zeros in my y-data-vector and the "mean" function then gets log(0) (=-Inf) as an argument and also returns -Inf
qplot(x, y, log="xy") + stat_summary(fun.y=mean, geom="point")
How can I make the "mean" function work on the 'normal' data and not on the log'ed data?
Cheers,
Manuel
Calculate the stats before the transformation.
Ignoring the log scales for now, I think what you want to plot is something like this
p <- ggplot(dfr) +
geom_point(aes(x, y)) +
geom_point(
aes(
x = mean(x),
y = mean(y)
),
colour = "blue",
size = 5
) +
geom_rect(
aes(
xmin = mean(x) - sd(x),
xmax = mean(x) + sd(x),
ymin = mean(y) - sd(y),
ymax = mean(y) + sd(y)
),
alpha = 0.2
)
p
Now adding in the log scale is done as usual
p +
scale_x_log10() +
scale_y_log10()
Of course, you zeroes will not show on the graph, as they shouldn't. To deal with them, you have a choice between removing them from the dataset or substituting a small positive number.
EDIT: If you want stats for y values grouped by an x value, it sounds like your x-variable is a factor, in which case you probably want a barchart. Log y scales for barcharts are a bad idea, but you could possibly justify a square root transformation instead.
Read the help page for coord_trans. Using coord_trans(xtrans = 'log10', ytrans = 'log10') would help you create a log-log plot, since coordinate transformations occur after all statistics have been calculated.

Resources