calculate the median of intervals of a vector in R - r

I have the following challenge:
Let X be a V.A. discrete evenly distributed in the interval [1, 10]. Be Y
another V.A. each sample of Y is given by the median value of one
set of 10 X samples. Generate 10^6 Y samples.
I'm using R, with the following code:
x <- runif(10000000,1,10)
sample <- cut(x,breaks=10)
y = median(sample)

You could do:
mosaic::median(~x|sample)
or in base R:
tapply(x, sample, median)

Related

Generating bivariate data where x variable is uniformly distributed between 0 and 1 and Y is normally distributed with mean 1/x with some noise

I used x <- c(runif(100, 0, 1)) to generate 100 x's between 0 and 1.
Now for each of the x's I am trying to generate 10 y's with mean 1/x and variance of 1.
Preferably stored in a matrix and so if I was to plot the 1000 points on y and x, it would look like the graph y = 1/x + some error.
Any help would be greatly appreciated.
If you want the data in a matrix, then you can do
x <- runif(100, 0, 1)
y <- sapply(x, function(m) rnorm(10, 1/m, 1))
This uses sapply to generate 10 normal values for each x value.
If you wanted one, two-column, matrix, then maybe
points <- do.call("rbind", lapply(x, function(m) cbind(x=m, y=rnorm(10, 1/m, 1))))
is what you want. You can plot that with
plot(y~x, points)

How to randomly generate (x, y) points following a linear equation?

I have a equation y=9x+6. I want to extract 10 random points from this function. How should I proceed?
Generate 10 random x-values, in this example uniformly distributed (function runif), and then calculate the corresponding y-values following your equation.
You can control the x-range by setting different min and max parameters to function runif.
x <- runif(10, min = 0, max = 1)
y <- 9*x+6
plot(x,y)

R boxplot with already computed mean, confidence intervals and min max

I am trying to generate a boxplot in R using already computed confidence intervals and min and max. For time 1,2,3,4,5 (x-axis), I have MN which represents array of 5 elements, each describing the mean at time point. I also have CI1, CI2, MINIM, and MAXM, each as an array of 5 elements, one for each time step, representing upper CI, lower CI , minimum and maximum.
I want to generate 5 box plots bars at each time step.
I have tried the usual box plot function, but I could get it to work with already computed CIs and min max.
It would be great if the method work for normal plot function, though ggplot woll be fine too.
Since you have not posted data, I will use the builtin iris dataset, keeping the first 4 columns.
data(iris)
iris2 <- iris[-5]
The function boxplot computes the statistics it uses and then calls bxp to do the printing, passing it those computed values.
If you want a different set of statistics you will have to compute them and pass them to bxp manually.
I am assuming that by CI you mean normal 95% confidence intervals. For that you need to compute the standard errors and the mean values first.
s <- apply(iris2, 2, sd)
mn <- colMeans(iris2)
ci1 <- mn - qnorm(0.95)*s
ci2 <- mn + qnorm(0.95)*s
minm <- apply(iris2, 2, min)
maxm <- apply(iris2, 2, max)
Now have boxplot create the data structure used by bxp, a matrix.
bp <- boxplot(iris2, plot = FALSE)
And fill the matrix with the values computed earlier.
bp$stats <- matrix(c(
minm,
ci1,
mn,
ci2,
maxm
), nrow = 5, byrow = TRUE)
Finally, plot it.
bxp(bp)

Extract approximate probability density function (pdf) in R from random sampling

I have got n>2 independent continuous Random Variables(RV). For example say I have 4 Uniform RVs with different set of Upper and lowers.
W~U[-1,5], X~U[0,1], Y~[0,2], Z~[0.5,2]
I am trying to find out the approximate PDF for the sum of these RVs i.e. for T=W+X+Y+Z. As I don't need any closed form solution, I have sampled 1 million points for each of them to get 1 million samples for T. Is it possible in R to get the approximate PDF function or a way to get approximate probability of P(t<T)from this samples I have drawn. For example is there a easy way I can calculate P(0.5<T) in R. My priority here is to get probability first even if getting the density function is not possible.
Thanks
Consider the ecdf function:
set.seed(123)
W <- runif(1e6, -1, 5)
X <- runif(1e6, 0, 1)
Y <- runif(1e6, 0, 2)
Z <- runif(1e6, 0.5, 2)
T <- Reduce(`+`, list(W, X, Y, Z))
cdfT <- ecdf(T)
1 - cdfT(0.5) # Pr(T > 0.5)
# [1] 0.997589
See How to calculate cumulative distribution in R? for more details.

How to predict using a locally smoothed mean?

(Statistics beginner here).
I have some training data (x,y), and wish to make prediction for new data x_new.
Now let's assume I have the data for the plot below, but I do not know how y is computed. So I would like to use the data I have a calculate for any given x the local mean of y data, as this seems like the best guess I can make.
install.packages("gplots")
library("gplots")
x <- abs(rnorm(500))
y <- rnorm(500, mean=2*x, sd=2+2*x)
bandplot(x,y)
Is there a R function to predict y for a given x, using the locally smoothed mean (here shown in red thanks to the function bandplot), or something similar?
wapply from gplots returns the locally smoothed mean as a table for x and y.
x <- 1:1000
y <- rnorm(1000, mean=1, sd=1 + x/1000 )
wapply(x,y,mean)
to predict, one would need, I guess, to resolve the closest x that is in the table returned by wapply, then deduce the local mean for y.
For a value a, the closest x will be given by the index:
index = which(abs(wapply(x,y,mean)$x-a)==min(abs(wapply(x,y,mean)$x-a)))
then the prediction should be:
pred = wapply(x,y,mean)[index]
So in one line:
locally_smoothed_mean_prediction = function(a) wapply(x,y,mean)$y[which(abs(wapply(x,y,mean)$x-a)==min(abs(wapply(x,y,mean)$x-a)))]
> locally_smoothed_mean_prediction(600)
[1] 1.055642

Resources