I would like to generate a random sequence composed of 3000 points, which follows the normal distribution. The mean is c and the standard deviation is d. But I would like these 3000 points lies in the range of [a,b].
Can you tell me how to do it in R?
If I would like to plot this sequence, if Y-axis uses the generated 3000 points, then how should I generate the points corresponding to X-axis.
You can do this using standard R functions like this:
c <- 1
d <- 2
a <- -2
b <- 3.5
ll <- pnorm(a, c, d)
ul <- pnorm(b, c, d)
x <- qnorm( runif(3000, ll, ul), c, d )
hist(x)
range(x)
mean(x)
sd(x)
plot(x, type='l')
The pnorm function is used to find the limits to use for the uniform distriution, data is then generated from a uniform and then transformed back to the normal.
This is even simpler using the distr package:
library(distr)
N <- Norm(c,d)
N2 <- Truncate(N, lower=a, upper=b)
plot(N2)
x <- r(N2)(3000)
hist(x)
range(x)
mean(x)
sd(x)
plot(x, type='l')
Note that in both cases the mean is not c and the sd is not d. If you want the mean and sd of the resulting truncated data to be c and d, then you need the parent distribution (before truncating) to have different values (higher sd, mean depends on the truncating values), finding those values would be a good homework problem for a math/stat theory course. If that is what you really need then add a comment or edit the question to say so specifically.
If you want to generate the data from the untruncated normal, but only plot the data within the range [a,b] then just use the ylim argument to plot:
plot( rnorm(3000, c, d), ylim=c(a,b) )
Generating a random sequence of numbers from any probability distribution is very easy in R. To do this for the normal distribution specifically
c = 1
d = 2
x <- rnorm(3000, c, d)
Clipping the values in x so that they're only within a given range is kind of a strange thing to want to do with a sample from the normal distribution. Maybe what you really want to do is sample a uniform distribution.
a = 0
b = 3
x2 <- runif(3000, a, b)
As for how the plot the distribution, I'm not sure I follow your question. You can plot a density estimate for the sample with this code
plot(density(x))
But, if you want to plot this data as a scatter plot of some sort, you actually need to generate a second sample of numbers.
If I would like to plot this sequence, if Y-axis uses the generated 3000 points, then how should I generate the points corresponding to X-axis.
If you just generate your points, like JoFrhwld said with
y <- rnorm(3000, 1, 2)
Then
plot(y)
Will automatically plot them using the array indices as x axis
a = -2; b = 3
plot(dnorm, xlim = c(a, b))
Related
I want to see how three variables x, y, and z respond to a function f using R.
I've searched for R solutions (e.g. rgl using 4d plots) but none seem to allow the input of a function as the fourth variable while allowing manipulation of x, y, and z across their full range of values.
# First I create three variables that each have a domain 0 to 4
x
y
z
# Then I create a function from those three variables
f <- sqrt(x^2 + y^2 + z^2)
EDIT: I originally stated that I wanted x, y, and z to be seq(0, 4, 0.01) but in fact I only want them to range from 0 to 4, and do so independently of other variables. In other words, I want to plot the function across a range of values letting x move independently of y and z and so forth, rather than plotting a 3-D line. The result should be a 3-D surface.
I want to:
a) see how the function f responds to all possible combinations of x, y, and z across a range of x, y, and z values 0 to 4, and
b) find what maxima/minima exist especially when holding one variable constant.
This is rather a mathematical questions. Unfortunately, our computer screens are not really made fro 4D, neither our brains. So what you ask wont be possible as if. Indeed, you want to show a dense set of data (a cube between 0 and 4), and we can not display what is "inside" the cube.
To come back to R, you can always display a slice of it, for example fixing z and plot sqrt(x^2 + y^2 + z^2) for x and y. Here you have two examples:
# Points where the function should be evaluated
x <- seq(0, 4, 0.01)
y <- seq(0, 4, 0.01)
z <- seq(0, 4, 0.01)
# Compute the distance from origin
distance <- function(x,y,z) {
sqrt(x^2 + y^2 + z^2)
}
# Matrix to store the results
slice=matrix(0, nrow=length(x),ncol=length(y))
# Fill the matrix with a slice at z=3
i=1
for (y_val in y)
{
slice[,i]=distance(x,y_val,3)
i=i+1
}
# PLot with plot3D library
require(plot3D)
persp3D(z = slice, theta = 100,phi=50)
# PLot with raster library
library(raster)
plot(raster(slice,xmn=min(x), xmx=max(x), ymn=min(y), ymx=max(y)))
If you change your z values, you will not really change the shape (just making it "flatter" for bigger z). Note that the function being symmetric in x, y and z, the same plots are produced if you keep xor y constant.
For your last question about the maximum, you can re-use the slice matrix and do:
max_ind=which(slice==max(slice),arr.ind = TRUE)
x[max_ind[,1]]
y[max_ind[,2]]
(see Get the row and column name of the minimum element of a matrix)
But again with math we can see from your equation that the maximum will always be obtained by maxing x, y and z. Indeed, the function simply measure the distance from the origin.
I have a 2-dimensional data set.
I use the R's smooth.spline function to smooth my points graph following an example in this article:
https://stat.ethz.ch/R-manual/R-devel/library/stats/html/predict.smooth.spline.html
So that I get the spline graph similar to the green line on this picture
I'd like to know the X values, where the first derivative of the smoothing spline equals zero (to determine exact minimum or maximum).
My problem is that my initial dataset (or a dataset that I could auto-generate) to feed into the predict() function does not contain such exact X values that correspond to the smoothing spline extrema.
How can I find such X values?
Here is the picture of the first derivative of the green spline line above
But exact X coordinate of extremums are still not exact.
My approximate R script to generate the pictures looks like the following
sp1 <- smooth.spline(df)
pred.prime <- predict(sp1, deriv=1)
pred.second <- predict(sp1, deriv=2)
d1 <- data.frame(pred.prime)
d2 <- data.frame(pred.second)
dfMinimums <- d1[abs(d1$y) < 1e-4, c('x','y')]
I think that there are two problems here.
You are using the original x-values and they are spaced too far apart AND
Because of the wide spacing of the x's, your threshold for where you consider the derivative "close enough" to zero is too high.
Here is basically your code but with many more x values and requiring smaller derivatives. Since you do not provide any data, I made a coarse approximation to it that should suffice for illustration.
## Coarse approximation of your data
x = runif(300, 0,45000)
y = sin(x/5000) + sin(x/950)/4 + rnorm(300, 0,0.05)
df = data.frame(x,y)
sp1 <- smooth.spline(df)
Spline code
Sx = seq(0,45000,10)
pred.spline <- predict(sp1, Sx)
d0 <- data.frame(pred.spline)
pred.prime <- predict(sp1, Sx, deriv=1)
d1 <- data.frame(pred.prime)
Mins = which(abs(d1$y) < mean(abs(d1$y))/150)
plot(df, pch=20, col="navy")
lines(sp1, col="darkgreen")
points(d0[Mins,], pch=20, col="red")
The extrema look pretty good.
plot(d1, type="l")
points(d1[Mins,], pch=20, col="red")
The points identified look like zeros of the derivative.
You can use my R package SplinesUtils: https://github.com/ZheyuanLi/SplinesUtils, which can be installed by
devtools::install_github("ZheyuanLi/SplinesUtils")
The function to be used are SmoothSplinesAsPiecePoly and solve. I will just use the example under the documentation.
library(SplinesUtils)
## a toy dataset
set.seed(0)
x <- 1:100 + runif(100, -0.1, 0.1)
y <- poly(x, 9) %*% rnorm(9)
y <- y + rnorm(length(y), 0, 0.2 * sd(y))
## fit a smoothing spline
sm <- smooth.spline(x, y)
## coerce "smooth.spline" object to "PiecePoly" object
oo <- SmoothSplineAsPiecePoly(sm)
## plot the spline
plot(oo)
## find all stationary / saddle points
xs <- solve(oo, deriv = 1)
#[1] 3.791103 15.957159 21.918534 23.034192 25.958486 39.799999 58.627431
#[8] 74.583000 87.049227 96.544430
## predict the "PiecePoly" at stationary / saddle points
ys <- predict(oo, xs)
#[1] -0.92224176 0.38751847 0.09951236 0.10764884 0.05960727 0.52068566
#[7] -0.51029209 0.15989592 -0.36464409 0.63471723
points(xs, ys, pch = 19)
One caveat in the #G5W implementation that I found is that it sometimes returns multiple records close around extrema instead of a single one. On the diagram they cannot be seen, since they all fall into one point effectively.
The following snippet from here filters out single extrema points with the minimum value of the first derivative:
library(tidyverse)
df2 <- df %>%
group_by(round(y, 4)) %>%
filter(abs(d1) == min(abs(d1))) %>%
ungroup() %>%
select(-5)
I would like to match points in 3-dimensional space.
Therefore, I am using the Hungarian Method described in this question: Finding the best matching pairwise points from 2 vectors
Here is my example using R:
# packages
library(rgl)
library(clue)
library(plyr)
library(fields)
set.seed(1)
a <- c(rep(2,7), 3,4,5,6,3,4,5,6,7,7,7,7,7,7) # x values
b <- c(rep(3,7),3,3,3,3, 3,3,3,3,3,3,3,3,3,3) # y values
c <- c(seq(1,7),1,1,1,1,7,7,7,7,1,2,3,4,5,6) # z values
# transform the points
set.seed(2)
a1 <- a + seq(1,length(a))
b1 <- b + 8
c1 <- c + 9
# plot the data
plot3d(a,b,c, col="red", pch=16,size=10)
plot3d(a1,b1,c1, lwd=10, col="blue", pch=16,size=10, add=TRUE)
# run the Hungarian Method
A <- cbind(a,b,c)
B <- cbind(a1,b1,c1)
distances <- rdist(A,B) # calculate Euclidean Distance between points
min.dist <- solve_LSAP(distances) # minimizing the sum of distance
min.dist.num <- as.numeric(min.dist)
# plot the minimized lines between point sets
for (ii in 1:dim(B)[1]){
D <- c(A[ii,1], B[min.dist.num[ii],1])
R <- c(A[ii,2], B[min.dist.num[ii],2])
W <- c(A[ii,3], B[min.dist.num[ii],3])
segments3d(D,R,W,col=2,lwd=1)
}
# calculate the share of points that is matched correctly
sum(1:dim(B)[1]==min.dist.num)/dim(B)[1]* 100
The problem here is that only 5% of the points are matched correctly (see last line of the code). In my view, the main trouble is that the algorithm does not take the structure of the object (a square) into account.
Question: Is there any method that performs better for this sample data?
In my original data, the dimensional structure of the points is way more complicated. I have a cloud of data and within this cloud there are multiple subfigures.
I am seeking primarily for a solution in R, but other implementations (e.g. MATLAB, Excel, Java) are also welcome.
I could not find a viable solution to this problem (and I am a beginner in R).
I have an equation as shown below
where n and K are constants. a and b are the variables.
How do I generate a 2-dimensional plot for the above in R?
Thanks in advance.
factorialfunction <-function(a,b, n, K){
K*(b^a)*((2+b)^(n+a))
}
Klist = c(1,5,10,50,100,200)
nlist = c(1,5,10,50,100,200)
#note that the n and K values are recycled, make them whatever you wish, they are constants,
#while a and b take on any values, here 100 values between zero and one
res <- mapply(factorialfunction,a = seq(.01,1,by=.01),
b=seq(.01,1,by=.01), n = rep(nlist,each = 100), K=rep(Klist, each=100))
#Then you can plot this six times.
#allow six plots on the panel if you want
par(mfrow = c(3,2))
#loop through different plots
for (i in 1:6)
plot(1:100,res[1:100 + (i-1)*100])
Note In this code I chose a and b to be between zero and one, I am not familiar with this function but It looks like some type of Beta.
You can generate more than 6 plots by changing klist and nlist and your par and for loop parameters.
Here is what you get, note this code is customizable to produce the plots for the values of n, K, a, and b that you want.
Does R have a package for generating random numbers in multi-dimensional space? For example, suppose I want to generate 1000 points inside a cuboid or a sphere.
I have some functions for hypercube and n-sphere selection that generate dataframes with cartesian coordinates and guarantee a uniform distribution through the hypercube or n-sphere for an arbitrary amount of dimensions :
GenerateCubiclePoints <- function(nrPoints,nrDim,center=rep(0,nrDim),l=1){
x <- matrix(runif(nrPoints*nrDim,-1,1),ncol=nrDim)
x <- as.data.frame(
t(apply(x*(l/2),1,'+',center))
)
names(x) <- make.names(seq_len(nrDim))
x
}
is in a cube/hypercube of nrDim dimensions with a center and l the length of one side.
For an n-sphere with nrDim dimensions, you can do something similar, where r is the radius :
GenerateSpherePoints <- function(nrPoints,nrDim,center=rep(0,nrDim),r=1){
#generate the polar coordinates!
x <- matrix(runif(nrPoints*nrDim,-pi,pi),ncol=nrDim)
x[,nrDim] <- x[,nrDim]/2
#recalculate them to cartesians
sin.x <- sin(x)
cos.x <- cos(x)
cos.x[,nrDim] <- 1 # see the formula for n.spheres
y <- sapply(1:nrDim, function(i){
if(i==1){
cos.x[,1]
} else {
cos.x[,i]*apply(sin.x[,1:(i-1),drop=F],1,prod)
}
})*sqrt(runif(nrPoints,0,r^2))
y <- as.data.frame(
t(apply(y,1,'+',center))
)
names(y) <- make.names(seq_len(nrDim))
y
}
in 2 dimensions, these give :
From code :
T1 <- GenerateCubiclePoints(10000,2,c(4,3),5)
T2 <- GenerateSpherePoints(10000,2,c(-5,3),2)
op <- par(mfrow=c(1,2))
plot(T1)
plot(T2)
par(op)
Also check out the copula package. This will generate data within a cube/hypercube with uniform margins, but with correlation structures that you set. The generated variables can then be transformed to represent other shapes, but still with relations other than independent.
If you want more complex shapes but are happy with uniform and idependent within the shape then you can just do rejection sampling: generate data within a cube that contains your shape, then test if the points are within your shape, reject them if not, then keep doing this until there are enough points.
A couple of years ago, I made a package called geozoo. It is available on CRAN.
install.packages("geozoo")
library(geozoo)
It has many different functions to produce objects in N-dimensions.
p = 4
n = 1000
# Cube with points on it's face.
# A 3D version would be a box with solid walls and a hollow interior.
cube.face(p)
# Hollow sphere
sphere.hollow(p, n)
# Solid cube
cube.solid.random(p, n)
cube.solid.grid(p, 10) # evenly spaced points
# Solid Sphere
sphere.solid.random(p, n)
sphere.solid.grid(p, 10) # evenly spaced points
One of my favorite ones to watch animate is a cube with points along its edges, because it was one of the first objects that I made. It also gives you a sense of distance between vertices.
# Cube with points along it's edges.
cube.dotline(4)
Also, check out the website: http://streaming.stat.iastate.edu/~dicook/geometric-data/. It contains pictures and downloadable data sets.
Hope it meets your needs!
Cuboid:
df <- data.frame(
x = runif(1000),
y = runif(1000),
z = runif(1000)
)
head(df)
x y z
1 0.7522104 0.579833314 0.7878651
2 0.2846864 0.520284731 0.8435828
3 0.2240340 0.001686003 0.2143208
4 0.4933712 0.250840233 0.4618258
5 0.6749785 0.298335804 0.4494820
6 0.7089414 0.141114804 0.3772317
Sphere:
df <- data.frame(
radius = runif(1000),
inclination = 2*pi*runif(1000),
azimuth = 2*pi*runif(1000)
)
head(df)
radius inclination azimuth
1 0.1233281 5.363530 1.747377
2 0.1872865 5.309806 4.933985
3 0.2371039 5.029894 6.160549
4 0.2438854 2.962975 2.862862
5 0.5300013 3.340892 1.647043
6 0.6972793 4.777056 2.381325
Note: edited to include code for sphere
Here is one way to do it.
Say we hope to generate a bunch of 3d points of the form y = (y_1, y_2, y_3)
Sample X from multivariate Gaussian with mean zero and covariance matrix R.
(x_1, x_2, x_3) ~ Multivariate_Gaussian(u = [0,0,0], R = [[r_11, r_12, r_13],r_21, r_22, r_23], [r_31, r_32, r_33]]
You can find a function which generates Multivariate Gaussian samples in an R package.
Take the Gaussian cdf of each covariate (phi(x_1) , phi(x_2), phi(x_3)). In this case, phi is the Gaussian cdf of our variables. Ie phi(x_1) = Pr[x <= x_1] By the probability integral transform, these (phi(x_1) , phi(x_2), phi(x_3)) = (u_1, u_2, u_3), will each be uniformly distrubted on [0,1].
Then, take the inverse cdf of each uniformly distributed marginal. In other words take the inverse cdf of u_1, u_2, u_3:
F^{-1}(u_1), F^{-2}(u_2), F^{-3}(u_3) = (y_1, y_2, y_3), where F is the marginal cdf of the distrubution you are trying to sample from.