GGplot second y axis without the transformation of y axis - r

Does any one know how do you apply this
set.seed(101)
x <- 1:10
y <- rnorm(10)
## second data set on a very different scale
z <- runif(10, min=1000, max=10000)
par(mar = c(5, 4, 4, 4) + 0.3) # Leave space for z axis
plot(x, y) # first plot
par(new = TRUE)
plot(x, z, type = "l", axes = FALSE, bty = "n", xlab = "", ylab = "")
axis(side=4, at = pretty(range(z)))
mtext("z", side=4, line=3)
but using ggplot.
In ggplot you can only create sec.axis() or dup.axis() using a transformation of y axis. What about a whole new independent y axis which will be applied only for z variable and the simple y axis to be applied for the y variable.

ggplot2::sec_axis provides only one mechanism for providing a second axis, and it took a lot of convincing to get that into the codebase. You are responsible for coming up with the transformation. This transform must be linear in some way, so if either axis needs to be non-linear (e.g., exponential, logarithmic, etc), then your expert math skills will be put to the test.
If you can use scales, then this process becomes trivial:
dat <- data.frame(x, y, z)
ggplot(dat, aes(x, y)) +
geom_point() +
geom_line(
aes(y = zmod),
data = ~ transform(., zmod = scales::rescale(z, range(y), range(z)))
) +
scale_y_continuous(
sec.axis = sec_axis(~ scales::rescale(., range(dat$z), range(dat$y)),
breaks = c(2000,4000,6000,8000))
)
Unless I've missed something (I just checked ggplot2-3.3.5's NEWS.md file), this has not changed.

Related

Plotting a Function With Noise

I always wondered how such pictures are made:
I am working with the R programming language. I would like to plot a parabola with "random noise" added to the parabola. I tried something like this:
x = 1:100
y = x^2
z = y + rnorm(1, 100,100)
plot(x,z)
But this is still producing a parabola without "noise".
Can someone please show me how I can add "noise" to a parabola (or any function) in R?
Thanks!
In this case you need to generate 100 random points or will be adding the same amount of noise to each point (thus no noise). z = y + rnorm(100, 100,100)
x = 1:100
y = x^2
z = y + rnorm(length(y), 100,100)
plot(x,z)
In your code you add the same value to all your points so it just shifts your curve up by that constant. Instead you need to generate a vector of random noise the same length as your y variable. Also you probably want to set the mean = 0 for the rnorm() noise so that it's truly random noise around the true value not systematically 100 units larger.
To get something very similar to your example, you can overplot the second vector with noise using lines() and add a legend with the code below.
x = 1:100
y1 = x^2
y2 = y1 + rnorm(100, 0, 500)
plot(x, y1, type = "l", ylab = "y")
lines(x,y2,type = "l", col = "red")
legend(
x = "top",
legend = c("y1", "y2"),
col = c("black", "red"),
lwd = 1,
bty = "n",
horiz = T
)
Created on 2022-11-08 with reprex v2.0.2

Changing axis displays for a graph using plot3D in R?

I'm hoping to keep in the image below the ticks on the vertical z axis, but remove ticks and numbers from the x and y axes. I would like to be able to label my x and y axes with a label for each condition in my matrix, but have not figured out how to do this with text3D. For some reason (because I'm on a mac?) I can't download axes3D, which is one potential solution I've seen in other responses.
Here is my code:
x = c(0,1)
y = c(0,1)
zval = c(104.1861, 108.529, 110.3675, 110.4112)
z = matrix (zval, nrow=2, ncol=2, byrow=TRUE)
hist3D(x,y,z, zlim=c(101,111), colvar = NULL, d=2, col = "lightblue", NAcol = "white", breaks = NULL, colkey = NULL, theta=-60, phi=20, nticks=10, axes=TRUE, ticktype="detailed", space=0.5, lighting=TRUE, light="diffuse", shade=.5, ltheta = 50, bty = "g")
My output

Ultimately, I'd like something more along the lines of this:
I'm very new to R.
stackoverflow.com/questions/26794236/ggplot2-3d-bar-plot
^ this seems like it might be what I need, but I couldn't replicate the code without an error. When I tried to run this piece I got an error because my x and z (in this case) axes aren't numerical:
cloud(y~x+z, d, panel.3d.cloud=panel.3dbars, col.facet='grey', xbase=0.4, ybase=0.4, scales=list(arrows=FALSE, col=1), par.settings = list(axis.line = list(col = "transparent")))
Maybe this might be helpful (with the caveat that 3D plots can sometimes make interpretation more challenging).
First, I recreated a data frame d based on something similar to what you started with:
x = c(0, 0, 1, 1)
y = c(0, 1, 0, 1)
z = c(104.1861, 108.529, 110.3675, 110.4112)
d <- data.frame(
x = factor(as.logical(x)),
y = factor(as.logical(y)),
z = z
)
Note that for x and y I converted the 0 and 1 to FALSE and TRUE with as.logical, then made them factors.
Then for the plot:
library(latticeExtra)
cloud(z ~ x + y, data = d, panel.3d.cloud=panel.3dbars, col.facet='grey',
xbase=0.4, ybase=0.4, scales=list(arrows=FALSE, col=1),
par.settings = list(axis.line = list(col = "transparent")))
You will want the formula as z ~ x + y where z is a numeric response.
Edit: If you wish to customize the axis labels, you can set the factor labels as follows (for example):
d <- data.frame(
x = factor(as.logical(x), labels = c("Hi", "Lo")),
y = factor(as.logical(y), labels = c("Label1", "Label2")),
z = z
)
Plot

r : ecdf over histogram

in R, with ecdf I can plot a empirical cumulative distribution function
plot(ecdf(mydata))
and with hist I can plot a histogram of my data
hist(mydata)
How I can plot the histogram and the ecdf in the same plot?
EDIT
I try make something like that
https://mathematica.stackexchange.com/questions/18723/how-do-i-overlay-a-histogram-with-a-plot-of-cdf
Also a bit late, here's another solution that extends #Christoph 's Solution with a second y-Axis.
par(mar = c(5,5,2,5))
set.seed(15)
dt <- rnorm(500, 50, 10)
h <- hist(
dt,
breaks = seq(0, 100, 1),
xlim = c(0,100))
par(new = T)
ec <- ecdf(dt)
plot(x = h$mids, y=ec(h$mids)*max(h$counts), col = rgb(0,0,0,alpha=0), axes=F, xlab=NA, ylab=NA)
lines(x = h$mids, y=ec(h$mids)*max(h$counts), col ='red')
axis(4, at=seq(from = 0, to = max(h$counts), length.out = 11), labels=seq(0, 1, 0.1), col = 'red', col.axis = 'red')
mtext(side = 4, line = 3, 'Cumulative Density', col = 'red')
The trick is the following: You don't add a line to your plot, but plot another plot on top, that's why we need par(new = T). Then you have to add the y-axis later on (otherwise it will be plotted over the y-axis on the left).
Credits go here (#tim_yates Answer) and there.
There are two ways to go about this. One is to ignore the different scales and use relative frequency in your histogram. This results in a harder to read histogram. The second way is to alter the scale of one or the other element.
I suspect this question will soon become interesting to you, particularly #hadley 's answer.
ggplot2 single scale
Here is a solution in ggplot2. I am not sure you will be satisfied with the outcome though because the CDF and histograms (count or relative) are on quite different visual scales. Note this solution has the data in a dataframe called mydata with the desired variable in x.
library(ggplot2)
set.seed(27272)
mydata <- data.frame(x= rexp(333, rate=4) + rnorm(333))
ggplot(mydata, aes(x)) +
stat_ecdf(color="red") +
geom_bar(aes(y = (..count..)/sum(..count..)))
base R multi scale
Here I will rescale the empirical CDF so that instead of a max value of 1, its maximum value is whatever bin has the highest relative frequency.
h <- hist(mydata$x, freq=F)
ec <- ecdf(mydata$x)
lines(x = knots(ec),
y=(1:length(mydata$x))/length(mydata$x) * max(h$density),
col ='red')
you can try a ggplot approach with a second axis
set.seed(15)
a <- rnorm(500, 50, 10)
# calculate ecdf with binsize 30
binsize=30
df <- tibble(x=seq(min(a), max(a), diff(range(a))/binsize)) %>%
bind_cols(Ecdf=with(.,ecdf(a)(x))) %>%
mutate(Ecdf_scaled=Ecdf*max(a))
# plot
ggplot() +
geom_histogram(aes(a), bins = binsize) +
geom_line(data = df, aes(x=x, y=Ecdf_scaled), color=2, size = 2) +
scale_y_continuous(name = "Density",sec.axis = sec_axis(trans = ~./max(a), name = "Ecdf"))
Edit
Since the scaling was wrong I added a second solution, calculatin everything in advance:
binsize=30
a_range= floor(range(a)) +c(0,1)
b <- seq(a_range[1], a_range[2], round(diff(a_range)/binsize)) %>% floor()
df_hist <- tibble(a) %>%
mutate(gr = cut(a,b, labels = floor(b[-1]), include.lowest = T, right = T)) %>%
count(gr) %>%
mutate(gr = as.character(gr) %>% as.numeric())
# calculate ecdf with binsize 30
df <- tibble(x=b) %>%
bind_cols(Ecdf=with(.,ecdf(a)(x))) %>%
mutate(Ecdf_scaled=Ecdf*max(df_hist$n))
ggplot(df_hist, aes(gr, n)) +
geom_col(width = 2, color = "white") +
geom_line(data = df, aes(x=x, y=Ecdf*max(df_hist$n)), color=2, size = 2) +
scale_y_continuous(name = "Density",sec.axis = sec_axis(trans = ~./max(df_hist$n), name = "Ecdf"))
As already pointed out, this is problematic because the plots you want to merge have such different y-scales. You can try
set.seed(15)
mydata<-runif(50)
hist(mydata, freq=F)
lines(ecdf(mydata))
to get
Although a bit late... Another version which is working with preset bins:
set.seed(15)
dt <- rnorm(500, 50, 10)
h <- hist(
dt,
breaks = seq(0, 100, 1),
xlim = c(0,100))
ec <- ecdf(dt)
lines(x = h$mids, y=ec(h$mids)*max(h$counts), col ='red')
lines(x = c(0,100), y=c(1,1)*max(h$counts), col ='red', lty = 3) # indicates 100%
lines(x = c(which.min(abs(ec(h$mids) - 0.9)), which.min(abs(ec(h$mids) - 0.9))), # indicates where 90% is reached
y = c(0, max(h$counts)), col ='black', lty = 3)
(Only the second y-axis is not working yet...)
In addition to previous answers, I wanted to have ggplot do the tedious calculation (in contrast to #Roman's solution, which was kindly enough updated upon my request), i.e., calculate and draw the histogram and calculate and overlay the ECDF. I came up with the following (pseudo code):
# 1. Prepare the plot
plot <- ggplot() + geom_hist(...)
# 2. Get the max value of Y axis as calculated in the previous step
maxPlotY <- max(ggplot_build(plot)$data[[1]]$y)
# 3. Overlay scaled ECDF and add secondary axis
plot +
stat_ecdf(aes(y=..y..*maxPlotY)) +
scale_y_continuous(name = "Density", sec.axis = sec_axis(trans = ~./maxPlotY, name = "ECDF"))
This way you don't need to calculate everything beforehand and feed the results to ggpplot. Just lay back and let it do everything for you!

Plotting two Poisson processes on one plot

I have two Poisson processes:
n <- 100
x <- seq(0, 10, length = 1000)
y1 <- cumsum(rpois(1000, 1 / n))
y2 <- -cumsum(rpois(1000, 1 / n))
I would like to plot them in one plot and expect that y1 lies above x-axis and y2 lies below x-axis. I tried the following code:
plot(x, y1)
par(new = TRUE)
plot(x, y2, col = "red",
axes = FALSE,
xlab = '', ylab = '',
xlim = c(0, 10), ylim = c(min(y2), max(y1)))
but it did not work. Can someone please tell me how to fix this? (I am working with R for my code)
Many thanks in advance
How about
plot(x,y1, ylim=range(y1,y2), type="l")
lines(x, y2, col="red")
I would suggest trying to avoid multiple calls to plot with par(new=TRUE). That is usually very messy. Here we use lines() to add to an existing plot. The only catch is that the x and y limits won't change based on the new data, so we use ylim in the first plot() call to set a range appropriate for all the data.
Or if you don't want to worry about limits (like MrFlick mentioned) or the number of lines, you could also tide up your data and using melt and ggplot
df <- data.frame(x, y1, y2)
library(reshape2)
library(ggplot2)
mdf <- melt(df, "x")
ggplot(mdf, aes(x, value, color = variable)) +
geom_line()

Start plotting at y-axis in R

I'm drawing a ROC curve and R is putting to much space between the curve and the plot border. I want values at x = 0 to touch the y-axis and points at y = 1 to touch the upper border of the plot.
This image shows exactly how I want it:
http://en.wikipedia.org/wiki/File:Roccurves.png
Anyone got any idea?
For base graphics, use the axis style parameters, xaxs and yaxs, to constrain the plotting limits to the of the data.
dfr <- data.frame(x = 0:1, y = 0:1)
par(xaxs = "i", yaxs = "i")
with(dfr, plot(x, y))
For lattice, you use the xlim and ylim parameters.
xyplot(y ~ x, dfr, xlim = range(dfr$x), ylim = range(dfr$y))
For ggplot2, use coord_cartesian.
ggplot(dfr, aes(x, y)) +
geom_point() +
coord_cartesian(xlim = range(dfr$x), ylim = range(dfr$y))

Resources