Hey I want to generate 100 decimal numbers in the range of 10 and 50 with the mean of 32.2.
I can use this to generate the numbers in the wanted range, but I don't get the mean:
runif(100, min=10, max=50)
Or I could use this and I dont get the range:
rnorm(100,mean=32.2,sd=10)
How can I combine those two or can I use another function?
I have tried to use this approach:
R - random distribution with predefined min, max, mean, and sd values
But I dont get the exact mean I want... (31.7 in my example try)
n <- 100
y <- rgbeta(n, mean = 32.2, var = 200, min = 10, max = 50)
Edit: Ok i have lowered the var and the mean gets near to 32.2 but I still want some values near the min and max range...
In order to get random numbers between 10 and 50 with a (true) mean of 32.2, you would need a density function that would fulfill those properties.
A uniform distribution with a min of 10 and a max of 50 (runif) will never deliver you that mean, as the true mean is 30 for that distribution.
The normal distribution has a range from - infinity to infinity, independent of the mean it has, so runif will return numbers greater than 50 and smaller than 10.
You could use a truncated normal distribution
rnormTrunc(n = 100, mean = 32.2, sd = 1, min = 10, max = 50),
if that distribution would be okay. If you need a different distibution, things will get a little more complicated.
Edit: feel free to ask if you need the math behind that, but depending on what your density function should look like it will get very complicated
This isn't perfect, but maybe its a start. I can't get the range to work out perfectly, so I just played with the "max" until I got an output I was happy with. There is probably a more solid math way to do this. The result is uniform-adjacent... at best...
rand_unif_constrained <- function(num, min, max, mean) {
vec <- runif(num, min, max)
vec / sum(vec) * mean*num
}
set.seed(35)
test <- rand_unif_constrained(100, 10, 40, 32.2) #play with max until max output is less that 50
mean(test)
#> [1] 32.2
min(test)
#> [1] 12.48274
max(test)
#> [1] 48.345
hist(test)
Related
I have to make a normal distribution from a set of pre-established data, henceforth xvec. So, I know I need to use dnorm(xvec,meanvec,sdvec). But what values I put for mean and sd? Can I put always meanvec = mean(xvec) and sdvec = sd(xvec)? Is it a reasonable way? Or is it preferable let the default values of mean=0 and sd=1?
I'm asking this because I looked some examples and the values for mean and sd alwayes were chosen before. For example, this one, from https://www.tutorialspoint.com/r/r_normal_distribution.htm:
Create a sequence of numbers between -10 and 10 incrementing by 0.1.
x <- seq(-10, 10, by = .1)
Choose the mean as 2.5 and standard deviation as 0.5.
y <- dnorm(x, mean = 2.5, sd = 0.5)
Why did he put mean=2.5 and sd=0.5, once
> mean(x)
[1] 5.105265e-16
> sd(x)
[1] 5.816786?
I have a data set and one of columns contains random numbers raging form 300 to 400. I'm trying to find what proportion of this column in between 320 and 350 using R. To my understanding, I need to standardize this data and creates a bell curve first. I have the mean and standard deviation but when I do (X - mean)/SD and get histogram from this column it's still not a bell curve.
This the code I tried.
myData$C1 <- (myData$C1 - C1_mean) / C1_SD
If you are simply counting the number of observations in that range, there's no need to do any standardization and you may directly use
mean(myData$C1 >= 320 & myData$C1 <= 350)
As for the standardization, it definitely doesn't create any "bell curves": it only shifts the distribution (centering) and rescales the data (dividing by the standard deviation). Other than that, the shape itself of the density function remains the same.
For instance,
x <- c(rnorm(100, mean = 300, sd = 20), rnorm(100, mean = 400, sd = 20))
mean(x >= 320 & x <= 350)
# [1] 0.065
hist(x)
hist((x - mean(x)) / sd(x))
I suspect that what you are looking for is an estimate of the true, unobserved proportion. The standardization procedure then would be applicable if you had to use tabulated values of the standard normal distribution function. However, in R we may do that without anything like that. In particular,
pnorm(350, mean = mean(x), sd = sd(x)) - pnorm(320, mean = mean(x), sd = sd(x))
# [1] 0.2091931
That's the probability P(320 <= X <= 350), where X is normally distributed with mean mean(x) and standard deviation sd(x). The figure is quite different from that above since we misspecified the underlying distribution by assuming it to be normal; it actually is a mixture of two normal distributions.
I want to do some things :
Draw 100 times 50 number's from normal distribution with
mean = 10 and standard deviation = 20
For any draw i want to count his standard deviation and arithmetic mean.
At the end i want to create a vector which has a length 100, containing the absolute value of the difference of the standard deviation and the arithmetic mean. (i.e i want to create some vector x that x[i]=|a-b|, where a is the standard deviation of 100 numbers in i-th draw, and b is the mean of 100 number's in i-th draw.
What i Did :
Creating 100 draw's from normal distribution above :
replicate(100, rnorm(50, 10, 20), simplify = FALSE)
Now i have a problem. I know that i can use functions "mean" and "sd" to count arithmetic mean and standard deviation, but i have to define number's that i draw as a vector. What i mean :
Number's that i rolled in first draw - vector 1
Number's that i rolled in second draw - vector 2
And so on
Then i can count their arithmetic mean and standard deviation.
Then we can count |a-b| (define above). And at the end i will create the vector that x[i]=|a-b|.
I have an idea but i don't know how to write it.
This is a matter of assigning the result of replicate to a variable (of class "list", since simplify = FALSE) and then sapply the mean and sd functions.
set.seed(1234) # Make the results reproducible
repl <- replicate(100, rnorm(50, 10, 20), simplify = FALSE)
mu <- sapply(repl, mean)
s <- sapply(repl, sd)
D <- abs(s - mu)
head(D)
#[1] 16.761930 7.953432 6.833691 12.491605 5.490149 6.850794
A one-liner could be
D2 <- sapply(repl, function(x) abs(sd(x) - mean(x)))
identical(D, D2)
#[1] TRUE
I have the following function:
samp315<-function(n=30, desmean=86, distance=3.4995) {
x = seq(from = 0, to = 100, by = 0.1)
samp<-0
while (!between(mean(samp),desmean-distance,desmean+distance)) samp<-sample(x,n,replace=TRUE)
samp
}
percent <- samp315()
so pretty much I want to generate 30 numbers within 0-100 that has a mean of 86+/-3.4995, however whenever I run the last line it will load forever or when I am lucky it will genrate a list of desired results. Any idea on how i could change the function to improve its functionality?
As suggested by Parfait in the comments, you're using a randomization strategy that gives a low probability of providing the condition you're interested in. Did no other answer to this question help you out?
Some other possible strategies for you to try out.
n = 30
# Using truncated normal
library(truncnorm)
x = round(rtruncnorm(n, a = -0.0495, b = 100.0495, mean = 85, sd = 3.5*2), 1)
# Using beta
sig = 3
x = round(100*rbeta(n, (0.85)*sig, (1-0.85)*sig), 1)
The round(..., 1) is meant to align with your vector x. These methods would both have very few values away from 85. It's a trade-off you have to consider. If you want to have a mean in 85 +/- 3.5, then you can't too many values below 10, for example. So you have to lower the probability of such values being selected. Using your function, when it is completed, you'll probably find that values closer to 85 are more represented.
I am a beginner with R and want to calculate the SD of values in another dataframe several times within limits of values in a dataframe.
Imagine I have a dataframe looking like this.
peak <- c("max", "max", "max")
value <- c(42, 105, 170)
minbefore<- c(20, 50, 115)
minafter <- c(50, 115, 180)
extrema <- data.frame(peak, value, minbefore, minafter)
I now want to calculate the SD of the values in another dataframe em$Position within the limits of extrema$minbeforeand extrema$minafter for each row of the dataframe extreme.
My idea was something like this
extrema$SD <- sd(em$Position[em$Position>extrema$minbefore & em$Position<extrema$minafter])
Then I get the following error message: longer object length is not a multiple of shorter object length
Which absolutely makes sense to me because I assume that R probably tries to insert the whole vector extrema$minbefore and extrema$minafter resepectively and at the same time which obviuosly makes no sense.
What would be the right way to do it?
Thanks in advance.
Dominik.
You can use apply function to do this:
# dummy data
em <- data.frame(Position = unlist(as.integer(runif(n = 30, min = 20, max = 190))))
# function to calculate sd
extrema$SD <- apply(extrema[,c('minbefore','minafter')], 1, function(x){
return( sd(em[(em$Position > x[1]) & (em$Position < x[2]),'Position']))
})
print(extreme)
peak value minbefore minafter SD
1 max 42 20 50 5.966574
2 max 105 50 115 19.07878
3 max 170 115 180 18.407426
Explanation:
We traverse through each row of extreme, get the min and max values.
Using min, max values, we subset the em$Position and calculate the sd.