Related
I want to create 100 samples from a normal distribution. For the first class, the mean is to be taken as (0,0) and covariance matrix as [(1,0),(0,1)]. For the second class, the mean is to be taken as (5,0) but the covariance matrix is the same as for the first class and finally would like to visualize all 200 instances in a single plot with different colors for each class.
My problem is: When I generate this plot I am unsure about the final plot whether it actually has a volume of 200 samples.
My approach:
a1 <- c(1,0)
a2 <- c(0,1)
M <- cbind(a1, a2)
x <- cov(M)
dev <- sd(x, na.rm = FALSE)
C0 <- sample(rnorm(100, mean=0, sd=dev), size=100, replace=T)
C1 <- sample(rnorm(100, mean=5, sd=dev), size=100, replace=T)
plot(C0,C1, col=c("red","blue"), main = '200 samples, with mean 0 and 5 and S.D=0.5')
legend("topright", 95, legend=c("C0", "C1"),
col=c("red", "blue"), lty=1:2, cex=0.8)
I would like to know the corrections in my code.
plot
Aside from the plotting issue mentioned in the other answer, it seems from your description like you want to sample from two 2D multivariate normal distributions with different means.
If so, you can simply use the mvtnorm library to sample from these distributions, which is the multivariate normal distribution.
library(mvtnorm)
C0 <- rmvnorm(100, c(0,0), M) # 100 samples, means (0, 0), covariance mtx M
C1 <- rmvnorm(100, c(5,0), M)
Right now, you take the covariance of the covariance matrix you have by typing x <- cov(M). This doesn't make much sense unless I'm misunderstanding what you're trying to accomplish.
EDIT: This is the full code for what I think you're trying to accomplish:
a1 <- c(1, 0)
a2 <- c(0, 1)
M <- cbind(a1, a2)
C0 <- rmvnorm(100, c(0, 0), M)
C1 <- rmvnorm(100, c(5, 0), M)
plot(C0, col = "red", xlim = c(-5, 10), ylim = c(-5, 5), xlab = "X", ylab = "Y")
points(C1, col = "blue")
legend("topright", inset = .05, c("Class 1", "Class 2"), fill = c("red", "blue"))
which outputs the plot
Your x and y axes demonstrate that you're plotting C1 against C0. That's why your y-axis has its midpoint at 5 and the x-axis has it at 0. What you've done is plot 100 points with their x-coordinate from C0 and y-coordinate from C1.
Short of counting them, proving that you have 100 points on the screen is difficult. I know of no way to access the data that R has used to display your plot. However, one trick is to call text(C0,C1,label=1:150) after your code. This adds the numbers 1:150 to your plot, with each number having a corresponding label. If you had 200 points, this would be a tidy plot. However, since you have 100, many are labelled twice, making the plot unreadable.
If we make a new plot and use text(C0,C1,label=1:100) instead, things are much more clear:
What are the alternatives for drawing a simple curve for a function like
eq = function(x){x*x}
in R?
It sounds such an obvious question, but I could only find these related questions on stackoverflow, but they are all more specific
Plot line function in R
Plotting functions on top of datapoints in R
How can I plot a function in R with complex numbers?
How to plot a simple piecewise linear function?
Draw more than one function curves in the same plot
I hope I didn't write a duplicate question.
I did some searching on the web, and this are some ways that I found:
The easiest way is using curve without predefined function
curve(x^2, from=1, to=50, , xlab="x", ylab="y")
You can also use curve when you have a predfined function
eq = function(x){x*x}
curve(eq, from=1, to=50, xlab="x", ylab="y")
If you want to use ggplot,
library("ggplot2")
eq = function(x){x*x}
ggplot(data.frame(x=c(1, 50)), aes(x=x)) +
stat_function(fun=eq)
You mean like this?
> eq = function(x){x*x}
> plot(eq(1:1000), type='l')
(Or whatever range of values is relevant to your function)
plot has a plot.function method
plot(eq, 1, 1000)
Or
curve(eq, 1, 1000)
Here is a lattice version:
library(lattice)
eq<-function(x) {x*x}
X<-1:1000
xyplot(eq(X)~X,type="l")
Lattice solution with additional settings which I needed:
library(lattice)
distribution<-function(x) {2^(-x*2)}
X<-seq(0,10,0.00001)
xyplot(distribution(X)~X,type="l", col = rgb(red = 255, green = 90, blue = 0, maxColorValue = 255), cex.lab = 3.5, cex.axis = 3.5, lwd=2 )
If you need your range of values for x plotted in increments different from 1, e.g. 0.00001 you can use:
X<-seq(0,10,0.00001)
You can change the colour of your line by defining a rgb value:
col = rgb(red = 255, green = 90, blue = 0, maxColorValue = 255)
You can change the width of the plotted line by setting:
lwd = 2
You can change the size of the labels by scaling them:
cex.lab = 3.5, cex.axis = 3.5
As sjdh also mentioned, ggplot2 comes to the rescue. A more intuitive way without making a dummy data set is to use xlim:
library(ggplot2)
eq <- function(x){sin(x)}
base <- ggplot() + xlim(0, 30)
base + geom_function(fun=eq)
Additionally, for a smoother graph we can set the number of points over which the graph is interpolated using n:
base + geom_function(fun=eq, n=10000)
Function containing parameters
I had a function (emax()) involving 3 parameters (a, b & h) whose line I wanted to plot:
emax = function(x, a, b, h){
(a * x^h)/(b + x^h)
}
curve(emax, from = 1, to = 40, n=40 a = 1, b = 2, h = 3)
which errored with Error in emax(x) : argument "a" is missing, with no default error.
This is fixed by putting the named arguments within the function using this syntax:
curve(emax(x, a = 1, b = 2, h = 3), from = 1, to = 40, n = 40)
which is contrary to the documentation which writes curve(expr, from, to, n, ...) rather than curve(expr(x,...), from, to, n).
I can create a lognormal probability plot using the probplot() function from the e1071 package. A problem arises when I try to add another set of lognormal data to the first plot. Although I use the command par(new=T), the xaxis of the two plots are different and don't align.
Is there another way to go about this?
I tried using the points() function. However, it appears I need the x and y coordinates to plot it and I don't know how to extract the x, y coordinates from the probplot() function.
''' R
# Program to plot random logn failure times with probability plot
library(e1071)
logn_prob_plot <- function() {
set.seed(1)
x<-rlnorm(10,1,1)
par(bty="l")
par(col.lab="white")
p<-probplot(x,qdist=qlnorm)
par(col.lab="black")
mtext(text="failure time", col="black",side=1,line=3,outer=F)
mtext(text="lognormal probability", col="black",side=2,line=3,outer=F)
set.seed(2)
y=rlnorm(10,2,3)
par(new=T)
par(col.lab="white")
probplot(y,qdist=qlnorm,xlab="fail time",ylab="lognormal probability")
par(col.lab="black")
mtext(text="failure time", col="black",side=1,line=3,outer=F)
mtext(text="lognormal probability", col="black",side=2,line=3,outer=F)
}
logn_prob_plot()
My expected result is two groups of data on the same probability plot with the same x and y axes. Instead, I get two different x-axes that are not aligned.
First lets simulate the variables:
set.seed(1)
x<-rlnorm(10,1,1)
set.seed(2)
y=rlnorm(10,2,3)
The first probplot is:
p<-probplot(x,qdist=qlnorm, meanlog = 1, sdlog = 1)
which produces the output:
The second probplot is:
q <- probplot(y,qdist=qlnorm,meanlog = 2, sdlog = 3)
which produces the output:
Your best shot a merging them is using the scale of the smaller one and discarding some points:
p<-probplot(x,qdist=qlnorm, meanlog = 1, sdlog = 1)
points(sort(x), p[[1]](ppoints(length(x))), col = "red", pch = 19)
lines(q, col = "blue")
points(sort(y), q[[1]](ppoints(length(y))), col = "blue", pch = 19)
which gives:
The red line and points are from the distribution with meanlog = 1, sdlog = 1 and the
blue ones are from the one with meanlog = 2, sdlog = 3.
I further have to warn you that from reading the code of the probplot() function:
xl <- quantile(x, c(0.25, 0.75))
yl <- qdist(c(0.25, 0.75), ...)
slope <- diff(yl)/diff(xl)
the slope of the line is determined only by position the first and the third quartile and not bz what happens elsewhere.
What are the alternatives for drawing a simple curve for a function like
eq = function(x){x*x}
in R?
It sounds such an obvious question, but I could only find these related questions on stackoverflow, but they are all more specific
Plot line function in R
Plotting functions on top of datapoints in R
How can I plot a function in R with complex numbers?
How to plot a simple piecewise linear function?
Draw more than one function curves in the same plot
I hope I didn't write a duplicate question.
I did some searching on the web, and this are some ways that I found:
The easiest way is using curve without predefined function
curve(x^2, from=1, to=50, , xlab="x", ylab="y")
You can also use curve when you have a predfined function
eq = function(x){x*x}
curve(eq, from=1, to=50, xlab="x", ylab="y")
If you want to use ggplot,
library("ggplot2")
eq = function(x){x*x}
ggplot(data.frame(x=c(1, 50)), aes(x=x)) +
stat_function(fun=eq)
You mean like this?
> eq = function(x){x*x}
> plot(eq(1:1000), type='l')
(Or whatever range of values is relevant to your function)
plot has a plot.function method
plot(eq, 1, 1000)
Or
curve(eq, 1, 1000)
Here is a lattice version:
library(lattice)
eq<-function(x) {x*x}
X<-1:1000
xyplot(eq(X)~X,type="l")
Lattice solution with additional settings which I needed:
library(lattice)
distribution<-function(x) {2^(-x*2)}
X<-seq(0,10,0.00001)
xyplot(distribution(X)~X,type="l", col = rgb(red = 255, green = 90, blue = 0, maxColorValue = 255), cex.lab = 3.5, cex.axis = 3.5, lwd=2 )
If you need your range of values for x plotted in increments different from 1, e.g. 0.00001 you can use:
X<-seq(0,10,0.00001)
You can change the colour of your line by defining a rgb value:
col = rgb(red = 255, green = 90, blue = 0, maxColorValue = 255)
You can change the width of the plotted line by setting:
lwd = 2
You can change the size of the labels by scaling them:
cex.lab = 3.5, cex.axis = 3.5
As sjdh also mentioned, ggplot2 comes to the rescue. A more intuitive way without making a dummy data set is to use xlim:
library(ggplot2)
eq <- function(x){sin(x)}
base <- ggplot() + xlim(0, 30)
base + geom_function(fun=eq)
Additionally, for a smoother graph we can set the number of points over which the graph is interpolated using n:
base + geom_function(fun=eq, n=10000)
Function containing parameters
I had a function (emax()) involving 3 parameters (a, b & h) whose line I wanted to plot:
emax = function(x, a, b, h){
(a * x^h)/(b + x^h)
}
curve(emax, from = 1, to = 40, n=40 a = 1, b = 2, h = 3)
which errored with Error in emax(x) : argument "a" is missing, with no default error.
This is fixed by putting the named arguments within the function using this syntax:
curve(emax(x, a = 1, b = 2, h = 3), from = 1, to = 40, n = 40)
which is contrary to the documentation which writes curve(expr, from, to, n, ...) rather than curve(expr(x,...), from, to, n).
I am using the following code to create a standard normal distribution in R:
x <- seq(-4, 4, length=200)
y <- dnorm(x, mean=0, sd=1)
plot(x, y, type="l", lwd=2)
I need the x-axis to be labeled at the mean and at points three standard deviations above and below the mean. How can I add these labels?
The easiest (but not general) way is to restrict the limits of the x axis. The +/- 1:3 sigma will be labeled as such, and the mean will be labeled as 0 - indicating 0 deviations from the mean.
plot(x,y, type = "l", lwd = 2, xlim = c(-3.5,3.5))
Another option is to use more specific labels:
plot(x,y, type = "l", lwd = 2, axes = FALSE, xlab = "", ylab = "")
axis(1, at = -3:3, labels = c("-3s", "-2s", "-1s", "mean", "1s", "2s", "3s"))
Using the code in this answer, you could skip creating x and just use curve() on the dnorm function:
curve(dnorm, -3.5, 3.5, lwd=2, axes = FALSE, xlab = "", ylab = "")
axis(1, at = -3:3, labels = c("-3s", "-2s", "-1s", "mean", "1s", "2s", "3s"))
But this doesn't use the given code anymore.
If you like hard way of doing something without using R built in function or you want to do this outside R, you can use the following formula.
x<-seq(-4,4,length=200)
s = 1
mu = 0
y <- (1/(s * sqrt(2*pi))) * exp(-((x-mu)^2)/(2*s^2))
plot(x,y, type="l", lwd=2, col = "blue", xlim = c(-3.5,3.5))
An extremely inefficient and unusual, but beautiful solution, which works based on the ideas of Monte Carlo simulation, is this:
simulate many draws (or samples) from a given distribution (say the normal).
plot the density of these draws using rnorm. The rnorm function takes as arguments (A,B,C) and returns a vector of A samples from a normal distribution centered at B, with standard deviation C.
Thus to take a sample of size 50,000 from a standard normal (i.e, a normal with mean 0 and standard deviation 1), and plot its density, we do the following:
x = rnorm(50000,0,1)
plot(density(x))
As the number of draws goes to infinity this will converge in distribution to the normal. To illustrate this, see the image below which shows from left to right and top to bottom 5000,50000,500000, and 5 million samples.
In general case, for example: Normal(2, 1)
f <- function(x) dnorm(x, 2, 1)
plot(f, -1, 5)
This is a very general, f can be defined freely, with any given parameters, for example:
f <- function(x) dbeta(x, 0.1, 0.1)
plot(f, 0, 1)
I particularly love Lattice for this goal. It easily implements graphical information such as specific areas under a curve, the one you usually require when dealing with probabilities problems such as find P(a < X < b) etc.
Please have a look:
library(lattice)
e4a <- seq(-4, 4, length = 10000) # Data to set up out normal
e4b <- dnorm(e4a, 0, 1)
xyplot(e4b ~ e4a, # Lattice xyplot
type = "l",
main = "Plot 2",
panel = function(x,y, ...){
panel.xyplot(x,y, ...)
panel.abline( v = c(0, 1, 1.5), lty = 2) #set z and lines
xx <- c(1, x[x>=1 & x<=1.5], 1.5) #Color area
yy <- c(0, y[x>=1 & x<=1.5], 0)
panel.polygon(xx,yy, ..., col='red')
})
In this example I make the area between z = 1 and z = 1.5 stand out. You can move easily this parameters according to your problem.
Axis labels are automatic.
This is how to write it in functions:
normalCriticalTest <- function(mu, s) {
x <- seq(-4, 4, length=200) # x extends from -4 to 4
y <- (1/(s * sqrt(2*pi))) * exp(-((x-mu)^2)/(2*s^2)) # y follows the formula
of the normal distribution: f(Y)
plot(x,y, type="l", lwd=2, xlim = c(-3.5,3.5))
abline(v = c(-1.96, 1.96), col="red") # draw the graph, with 2.5% surface to
either side of the mean
}
normalCriticalTest(0, 1) # draw a normal distribution with vertical lines.
Final result: