Using user-defined functions within "curve" function in R graphics - r

I am needing to produce normally distributed density plots with different total areas (summing to 1). Using the following function, I can specify the lambda - which gives the relative area:
sdnorm <- function(x, mean=0, sd=1, lambda=1){lambda*dnorm(x, mean=mean, sd=sd)}
I then want to plot up the function using different parameters. Using ggplot2, this code works:
require(ggplot2)
qplot(x, geom="blank") + stat_function(fun=sdnorm,args=list(mean=8,sd=2,lambda=0.7)) +
stat_function(fun=sdnorm,args=list(mean=18,sd=4,lambda=0.30))
but I really want to do this in base R graphics, for which I think I need to use the "curve" function. However, I am struggling to get this to work.

If you take a look at the help file for ? curve, you'll see that the first argument can be a number of different things:
The name of a function, or a call or an expression written as a function of x which will evaluate to an object of the same length as x.
This means you can specify the first argument as either a function name or an expression, so you could just do:
curve(sdnorm)
to get a plot of the function with its default arguments. Otherwise, to recreate your ggplot2 representation you would want to do:
curve(sdnorm(x, mean=8,sd=2,lambda=0.7), from = 0, to = 30)
curve(sdnorm(x, mean=18,sd=4,lambda=0.30), add = TRUE)
The result:

You can do the following in base R
x <- seq(0, 50, 1)
plot(x, sdnorm(x, mean = 8, sd = 2, lambda = 0.7), type = 'l', ylab = 'y')
lines(x, sdnorm(x, mean = 18, sd = 4, lambda = 0.30))
EDIT I added ylab = 'y' and updated the picture to have the y-axis re-labeled.
This should get you started.

Related

Surface of bivariate discontinuous function

I have a bivariate step function that I want to create a surface of. The function looks essentially as follows:
df<-data.frame(a = rnorm(100, 0, 10), b = rnorm(100, 0, 10))
f<-function(x,y){
mean(df$a * x >= df$b * y)
}
When I use plot3d of the rgl package, I always receive an error message like
Error in dim(zvals) <- dim(xvals) :
dims [product 10201] do not match the length of object [1]
What is the problem here? Is there any alternative how to 3d-plot my function?
The problem is the definition of f. plot3d(f) is going to pass in vectors for x and y, and your function is going to take the mean of everything and return a single value.
The simplest way to fix this is to call the Vectorize function, which wraps f in loops to compute it separately for each x, y pair. For example, with your definition of f as in the question,
plot3d(Vectorize(f), xlim = c(-2,2), ylim = c(-2, 2))
produces this plot:

Creating a beta distribution Q-Q plot

My task is to create 100 random generated numbers from beta distribution and compare that random variable with beta distribution using quantile plot.
This is my attempt:
library(MASS)
library(qualityTools)
Random_Numbers_Beta <- rbeta(100, 1, 1)
qqPlot(Random_Numbers_Beta, "beta", list(shape = 1, rate = 1))
Unfortunately something is wrong. This is an error which occurs:
Error in (function (x, densfun, start, ...) :
'start' must be a named list
Can something be done with that issue?
First, you had to specify that list(shape = 1, rate = 1) is the start parameter; right now this list is being treated as a value for the confbounds parameter. Second, it's actually not shape and rate, but shape1 and shape2, as in, e.g., ?dbeta.
qqPlot(Random_Numbers_Beta, "beta", start = list(shape1 = 1, shape2 = 1))
Again inspecting ?qqPlot you may see that ... is for "further graphical parameters: (see par)." Hence, you may modify the plot the way you like; e.g., adding col = 'red'.
Also notice that Beta(1,1) is simply the uniform distribution on [0,1] and, hence, its quantile function is the identity function. That is, qbeta(x, 1, 1) == x for any x in [0,1]. So, you may also simply work directly with
x <- seq(0, 1, length = 500)
plot(quantile(Random_Numbers_Beta, x), x)
abline(a = 0, b = 1, col = 'red')
if you don't need the confidence bounds.
One can notice, however, that the two plots are a little different. Given your task, it would seem that you need the second one.
In the first one, it looks like qqPlot fits a beta distribution for your data and uses its quantiles, which apparently isn't exactly the identity function. That is, it doesn't use the exact knowledge about the parameters. The second plot uses this knowledge.

R-package beeswarm generates same x-coordinates

I am working on a script where I need to calculate the coordinates for a beeswarm plot without immediately plotting. When I use beeswarm, I get x-coordinates that aren't swarmed, and more or less the same value:
But if I generate the same plot again it swarms correctly:
And if I use dev.off() I again get no swarming:
The code I used:
n <- 250
df = data.frame(x = floor(runif(n, 0, 5)),
y = rnorm(n = n, mean = 500, sd = 100))
#Plot 1:
A = with(df, beeswarm(y ~ x, do.plot = F))
plot(x = A$x, y=A$y)
#Plot 2:
A = with(df, beeswarm(y ~ x, do.plot = F))
plot(x = A$x, y=A$y)
dev.off()
#Plot 3:
A = with(df, beeswarm(y ~ x, do.plot = F))
plot(x = A$x, y=A$y)
It seems to me like beeswarm uses something like the current plot parameters (or however it is called) to do the swarming and therefore chokes when a plot isn't showing. I have tried to play around with beeswarm parameters such as spacing, breaks, corral, corralWidth, priority, and xlim, but it does not make a difference. FYI: If do.plot is set to TRUE the x-coordinates are calculated correctly, but this is not helpful as I don't want to plot immediately.
Any tips or comments are greatly appreciated!
You're right; beeswarm uses the current plot parameters to calculate the amount of space to leave between points. It seems that setting "do.plot=FALSE" does not do what one would expect, and I'm not sure why I included this parameter.
If you want to control the parameters manually, you could use the functions swarmx or swarmy instead. These functions must be applied to each group separately, e.g.
dfsplitswarmed <- by(df, df$x, function(aa) swarmx(aa$x, aa$y, xsize = 0.075, ysize = 7.5, cex = 1, log = ""))
dfswarmed <- do.call(rbind, dfsplitswarmed)
plot(dfswarmed)
In this case, I set the xsize and ysize values based on what the function would default to for this particular data set. If you can find a set of xsize/ysize values that work for your data, this approach might work for you.
Otherwise, perhaps a simpler approach would be to leave do.plot=TRUE, and then discard the plots.

Basic Calculations with stat_functions -- Plotting hazard functions

I am currently trying to plot some density distributions functions with R's ggplot2. I have the following code:
f <- stat_function(fun="dweibull",
args=list("shape"=1),
"x" = c(0,10))
stat_F <- stat_function(fun="pweibull",
args=list("shape"=1),
"x" = c(0,10))
S <- function() 1 - stat_F
h <- function() f / S
wei_h <- ggplot(data.frame(x=c(0,10))) +
stat_function(fun=h) +
...
Basically I want to plot hazard functions based on a Weibull Distribution with varying parameters, meaning I want to plot:
The above code gives me this error:
Computation failed in stat_function():
unused argument (x_trans)
I also tried to directly use
S <- 1 - stat_function(fun="pweibull", ...)
instead of above "workaround" with the custom function construction. This threw another error, since I was trying to do numeric arithmetics on an object:
non-numeric argument for binary operator
I get that error, but I have no idea for a solution.
I have done some research, but without success. I feel like this should be straightforward. Also I would like to do it "manually" as much as possible, but if there is no simple way to do this, then a packaged solution is just fine aswell.
Thanks in advance for any suggestions!
PS: I basically want to recreate the graph you can find in Kiefer, 1988 on page 10 of the linked PDF file.
Three comments:
stat_function is a function statistic for ggplot2, you cannot divide two stat_function expressions by each other or otherwise use them in mathematical expressions, as in S <- 1 - stat_function(fun="pweibull", ...). That's a fundamental misunderstanding of what stat_function is. stat_function always needs to be added to a ggplot2 plot, as in the example below.
The fun argument for stat_function takes a function as an argument, not a string. You can define functions on the fly if you need ones that don't exist already.
You need to set up an aesthetic mapping, via the aes function.
This code works:
args = list("shape" = 1.2)
ggplot(data.frame(x = seq(0, 10, length.out = 100)), aes(x)) +
stat_function(fun = dweibull, args = args, color = "red") +
stat_function(fun = function(...){1-pweibull(...)}, args = args, color = "green") +
stat_function(fun = function(...){dweibull(...)/(1-pweibull(...))},
args = args, color = "blue")

superpose a histogram and an xyplot

I'd like to superpose a histogram and an xyplot representing the cumulative distribution function using r's lattice package.
I've tried to accomplish this with custom panel functions, but can't seem to get it right--I'm getting hung up on one plot being univariate and one being bivariate I think.
Here's an example with the two plots I want stacked vertically:
set.seed(1)
x <- rnorm(100, 0, 1)
discrete.cdf <- function(x, decreasing=FALSE){
x <- x[order(x,decreasing=FALSE)]
result <- data.frame(rank=1:length(x),x=x)
result$cdf <- result$rank/nrow(result)
return(result)
}
my.df <- discrete.cdf(x)
chart.hist <- histogram(~x, data=my.df, xlab="")
chart.cdf <- xyplot(100*cdf~x, data=my.df, type="s",
ylab="Cumulative Percent of Total")
graphics.off()
trellis.device(width = 6, height = 8)
print(chart.hist, split = c(1,1,1,2), more = TRUE)
print(chart.cdf, split = c(1,2,1,2))
I'd like these superposed in the same frame, rather than stacked.
The following code doesn't work, nor do any of the simple variations of it that I have tried:
xyplot(cdf~x,data=cdf,
panel=function(...){
panel.xyplot(...)
panel.histogram(~x)
})
You were on the right track with your custom panel function. The trick is passing the correct arguments to the panel.- functions. For panel.histogram, this means not passing a formula and supplying an appropriate value to the breaks argument:
EDIT Proper percent values on y-axis and type of plots
xyplot(100*cdf~x,data=my.df,
panel=function(...){
panel.histogram(..., breaks = do.breaks(range(x), nint = 8),
type = "percent")
panel.xyplot(..., type = "s")
})
This answer is just a placeholder until a better answer comes.
The hist() function from the graphics package has an option called add. The following does what you want in the "classical" way:
plot( my.df$x, my.df$cdf * 100, type= "l" )
hist( my.df$x, add= T )

Resources