Distributions and densities in R [duplicate] - r

This question already has answers here:
How to plot a function curve in R
(7 answers)
Closed 6 years ago.
I'm doing some research on truncated distributions, specifically on the Truncated Pareto distribution. This has a known density function and probability function, so one is able to design the quantile function and with that a 'random generate numbers' function.
But now that I have these functions, let's say that dtp(x,lower,upper,alpha) is my density function, how do I plot in fact the density? I know that there exists commands like density() which uses kernel estimation, but one should however be able to plot the density function with the aid of the density function itself and with random numbers following said distribution?

The standard way to plot is to have x values and y values, and then plot them. You have a function that converts x values to y values, which means that all you need to do is pick x values to plot and give them to your function, something like:
x = seq(0, 10, length.out = 100)
y = dtp(x = x)
plot(x, y, type = "l")
Note that I have no idea whether this is a reasonable domain for your density, if you have suitable default values for lower, upper, alpha or if you need to specify them, etc.
Alternatively, some functions like curve for base plot just take a function and a domain, you can pass through additional arguments.
curve(dtp, from = 0, to = 10, n = 101)
curve(dtp, from = 0, to = 10, n = 101, alpha = 0.2) # specifying alpha
If you prefer ggplot, then stat_function is the function for you.
library(ggplot2)
ggplot(data.frame(x = c(0, 10), aes(x = x)) +
stat_function(fun = dtp)
ggplot(data.frame(x = c(0, 10), aes(x = x)) +
stat_function(fun = dtp, args = list(alpha = 0.2))
# passing alpha to dtp via args

Related

How to draw a regression formula in R? [duplicate]

What are the alternatives for drawing a simple curve for a function like
eq = function(x){x*x}
in R?
It sounds such an obvious question, but I could only find these related questions on stackoverflow, but they are all more specific
Plot line function in R
Plotting functions on top of datapoints in R
How can I plot a function in R with complex numbers?
How to plot a simple piecewise linear function?
Draw more than one function curves in the same plot
I hope I didn't write a duplicate question.
I did some searching on the web, and this are some ways that I found:
The easiest way is using curve without predefined function
curve(x^2, from=1, to=50, , xlab="x", ylab="y")
You can also use curve when you have a predfined function
eq = function(x){x*x}
curve(eq, from=1, to=50, xlab="x", ylab="y")
If you want to use ggplot,
library("ggplot2")
eq = function(x){x*x}
ggplot(data.frame(x=c(1, 50)), aes(x=x)) +
stat_function(fun=eq)
You mean like this?
> eq = function(x){x*x}
> plot(eq(1:1000), type='l')
(Or whatever range of values is relevant to your function)
plot has a plot.function method
plot(eq, 1, 1000)
Or
curve(eq, 1, 1000)
Here is a lattice version:
library(lattice)
eq<-function(x) {x*x}
X<-1:1000
xyplot(eq(X)~X,type="l")
Lattice solution with additional settings which I needed:
library(lattice)
distribution<-function(x) {2^(-x*2)}
X<-seq(0,10,0.00001)
xyplot(distribution(X)~X,type="l", col = rgb(red = 255, green = 90, blue = 0, maxColorValue = 255), cex.lab = 3.5, cex.axis = 3.5, lwd=2 )
If you need your range of values for x plotted in increments different from 1, e.g. 0.00001 you can use:
X<-seq(0,10,0.00001)
You can change the colour of your line by defining a rgb value:
col = rgb(red = 255, green = 90, blue = 0, maxColorValue = 255)
You can change the width of the plotted line by setting:
lwd = 2
You can change the size of the labels by scaling them:
cex.lab = 3.5, cex.axis = 3.5
As sjdh also mentioned, ggplot2 comes to the rescue. A more intuitive way without making a dummy data set is to use xlim:
library(ggplot2)
eq <- function(x){sin(x)}
base <- ggplot() + xlim(0, 30)
base + geom_function(fun=eq)
Additionally, for a smoother graph we can set the number of points over which the graph is interpolated using n:
base + geom_function(fun=eq, n=10000)
Function containing parameters
I had a function (emax()) involving 3 parameters (a, b & h) whose line I wanted to plot:
emax = function(x, a, b, h){
(a * x^h)/(b + x^h)
}
curve(emax, from = 1, to = 40, n=40 a = 1, b = 2, h = 3)
which errored with Error in emax(x) : argument "a" is missing, with no default error.
This is fixed by putting the named arguments within the function using this syntax:
curve(emax(x, a = 1, b = 2, h = 3), from = 1, to = 40, n = 40)
which is contrary to the documentation which writes curve(expr, from, to, n, ...) rather than curve(expr(x,...), from, to, n).

How to plot the Standard Normal CDF in R?

As the title says, I'm trying to plot the CDF of a N(0,1) distribution between some values a, b. I.e. Phi_0,1 (a) to Phi_0,1 (b). For some reason I'm having issues finding information on how to do this.
You can use curve to do the plotting, pnorm is the normal probability (CDF) function:
curve(pnorm, from = -5, to = 2)
Adjust the from and to values as needed. Use dnorm if you want the density function (PDF) instead of the CDF. See the ?curve help page for a few additional arguments.
Or using ggplot2
library(ggplot2)
ggplot(data.frame(x = c(-5, 2)), aes(x = x)) +
stat_function(fun = pnorm)
Generally, you can generate data and use most any plot function capable of drawing lines in a coordinate system.
x = seq(from = -5, to = 2, length.out = 1000)
y = pnorm(x)

How can I place multiple unrelated graphs on the same axes in ggplot2?

I am trying to recreate an image found in a textbook in R, the original of which was built in MATLAB:
I have generated each of the graphs seperately, but what would be best practice them into an image like this in ggplot2?
Edit: Provided code used. This is just a transformation of normally distributed data.
library(ggplot2)
mean <- 6
sd <- 1
X <- rnorm(100000, mean = mean, sd = sd)
Y <- dnorm(X, mean = mean, sd = sd)
Y_p <- pnorm(X, mean = mean, sd = sd)
ch_vars <- function(X){
nu_vars <- c()
for (x in X){
nu_vars <- c(nu_vars, (1/(1 + exp(-x + 5))))
}
return(nu_vars)
}
nu_X <- ch_vars(X)
nu_Y <- ch_vars(Y)
data <- data.frame(x = X, y = Y, Y_p = Y_p, x = nu_X, y = nu_Y)
# Cumulative distribution
ggplot(data = data) +
geom_line(aes(x = X, y = Y_p))
# Distribution of initial data
ggplot(data = data_ch, aes(x = X)) +
geom_histogram(aes(y = ..density..), bins = 25, fill = "red", color = "black")
# Distribution of transformed data
ggplot(data = data, aes(x = nu_X)) +
geom_histogram(aes(y = ..density..), bins = 25, fill = "green", color = "black")
In short, you can't, or rather, you shouldn't.
ggplot is a high-level plotting packaging. More than a system for drawing shapes and lines, it's fairly "opinionated" about how data should be represented, and one of its opinions is that a plot should express a clear relationship between its axes and marks (points, bars, lines, etc.). The axes essentially define a coordinate space, and the marks are then plotted onto the space in a straightforward and easily interpretable manner.
The plot you show breaks that relationship -- it's a set of essentially arbitrary histograms all drawn onto the same box, where the axis values become ambiguous. The x-axis represents the values of 1 histogram and the y-axis represents another (and thus neither axis represents the histograms' heights).
It is of course technically possible to force ggplot to render something like your example, but it would require pre-computing the histograms, normalizing their values and bin heights to a common coordinate space, converting these into suitable coordinates for use with geom_rect, and then re-labeling the plot axes. It would be a very large amount of manual effort and ultimately defeats the point of using a high-level plotting grammar like ggplot.

Surface plot Q in R - compable to surf() in matlab

I want to plot a matrix of z values with x rows and y columns as a surface similar to this graph from MATLAB.
Surface plot:
Code to generate matrix:
# Parameters
shape<-1.849241
scale<-38.87986
x<-seq(from = -241.440, to = 241.440, by = 0.240)# 2013 length
y<-seq(from = -241.440, to = 241.440, by = 0.240)
matrix_fun<-matrix(data = 0, nrow = length(x), ncol = length(y))
# Generate two dimensional travel distance probability density function
for (i in 1:length(x)) {
for (j in 1:length(y)){
dxy<-sqrt(x[i]^2+y[j]^2)
prob<-1/(scale^(shape)*gamma(shape))*dxy^(shape-1)*exp(-(dxy/scale))
matrix_fun[i,j]<-prob
}}
# Rescale 2-d pdf to sum to 1
a<-sum(matrix_fun)
matrix_scale<-matrix_fun/a
I am able to generate surface plots using a couple methods (persp(), persp3d(), surface3d()) but the colors aren't displaying the z values (the probabilities held within the matrix). The z values only seem to display as heights not as differentiated colors as in the MATLAB figure.
Example of graph code and graphs:
library(rgl)
persp3d(x=x, y=y, z=matrix_scale, color=rainbow(25, start=min(matrix_scale), end=max(matrix_scale)))
surface3d(x=x, y=y, z=matrix_scale, color=rainbow(25, start=min(matrix_scale), end=max(matrix_scale)))
persp(x=x, y=y, z=matrix_scale, theta=30, phi=30, col=rainbow(25, start=min(matrix_scale), end=max(matrix_scale)), border=NA)
Image of the last graph
Any other tips to recreate the image in R would be most appreciated (i.e. legend bar, axis tick marks, etc.)
So here's a ggplot solution which seems to come a little bit closer to the MATLAB plot
# Parameters
shape<-1.849241
scale<-38.87986
x<-seq(from = -241.440, to = 241.440, by = 2.40)
y<-seq(from = -241.440, to = 241.440, by = 2.40)
df <- expand.grid(x=x,y=y)
df$dxy <- with(df,sqrt(x^2+y^2))
df$prob <- dgamma(df$dxy,shape=shape,scale=scale)
df$prob <- df$prob/sum(df$prob)
library(ggplot2)
library(colorRamps) # for matlab.like(...)
library(scales) # for labels=scientific
ggplot(df, aes(x,y))+
geom_tile(aes(fill=prob))+
scale_fill_gradientn(colours=matlab.like(10), labels=scientific)
BTW: You can generate your data frame of probabilities much more efficiently using the built-in dgamma(...) function, rather than calculating it yourself.
In line with alexis_laz's comment, here is an example using filled.contour. You might want to increase your by to 2.40 since the finer granularity increases the time it takes to generate the plot by a lot but doesn't improve quality.
filled.contour(x = x, y = y, z = matrix_scale, color = terrain.colors)
# terrain.colors is in the base grDevices package
If you want something closer to your color scheme above, you can fiddle with the rainbow function:
filled.contour(x = x, y = y, z = matrix_scale,
color = (function(n, ...) rep(rev(rainbow(n/2, ...)[1:9]), each = 3)))
Finer granularity:
filled.contour(x = x, y = y, z = matrix_scale, nlevels = 150,
color = (function(n, ...)
rev(rep(rainbow(50, start = 0, end = 0.75, ...), each = 3))[5:150]))

How can I overlay 2 normal distribution curves on the same plot?

I'm brand new to R, and I'm facing a problem without much in- class resources. I need to do something that I'm sure is very simple. Can someone point me in the right direction? This is my task:
Let X denote the monthly return on Microsoft Stock and let Y denote
the monthly return on Starbucks stock. Assume that X∼N(0.05,(0.10)2)
and Y∼N(0.025,(0.05)2).
Using a grid of values between –0.25 and 0.35, plot the normal curves
for X and Y. Make sure that both normal curves are on the same plot.
I've only been able to get a randomly generated normal distribution generated, but not both on the same plot, and not by specifying mean and st dev. Big thanks in advance.
use the functions lines or points, i.e.
s <- seq(-.25,.35,0.01)
plot(s, dnorm(s,mean1, sd1), type="l")
lines(s, dnorm(s,mean2, sd2), col="red")
also, check the function par (using
?par
)
for plotting options, common options include labels (xlab/ylab), plotlimits (xlim/ylim), colors(col), etc...
You have a couple of options
Using base R
You can use the plot.function method (which calls curve to plot a function). This is what is called if you call plot(functionname)
You will probably need to roll your own function so this will work. Also, you will need to set up the ylim so the whole range of both functions is shown.
# for example
fooX <- function(x) dnorm(x, mean = 0.05, sd = 0.1)
plot(fooX, from = -0.25, to = 0.35)
# I will leave the definition of fooY as an exercise.
fooY <- function(x) {# fill this is as you think fit!}
# see what it looks like
plot(fooY, from = -0.25, to = 0.35)
# now set appropriate ylim (left as an exercise)
# bonus marks if you work out a method that doesn't require this!
myYLim <- c(0, appropriateValue)
# now plot
plot(fooX, from = -0.25, to = 0.35, ylim = myYLim)
# add the second plot, (note add = TRUE)
plot(fooY, from = -0.25, to = 0.35, add = TRUE)
Using ggplot2
ggplot has a function stat_function that will impose a function on a plot. The examples in ?stat_function show how to add two Normal pdf functions with different means to the same plot.
As suggested by #mnel, we can use ggplot2 to plot several normal distributions in one plot. (I found this post is quite old, but I didn't find an easy answer to this question after searching in some other posts, so I would like to post an answer here)
base <- ggplot() + xlim(-10, 10)
base +
geom_function(aes(colour = "Original prior"), fun = dnorm, args = list(mean = 1, sd = 2)) +
geom_function(aes(colour = "Prior 1"), fun = dnorm, args = list(mean = 2, sd = 2)) +
geom_function(aes(colour = "Prior 2"), fun = dnorm, args = list(mean = 3, sd = 4)) +
geom_function(aes(colour = "Prior 3"), fun = dnorm, args = list(mean = 4, sd = 2))
normal distribution

Resources