regarding the usage of runif function - r

I once saw the following R code,
x<-runif(3,max=c(10,20,30))
If the min is not set, what's the lower range for the generated random variable. Besides,when max is setup this way, my understanding is that it will iterate over the three values given in c() for each generated variable, is that right?

If you look at the ?runif help page, you'll see the default for min= is 0.
If you specify multiple values for max, the values are recycled so it's like the first value comes from unif(0,10), the second from unif(0,20) and the third from (0,30) and that pattern repeats for as many values as you request. If you only request one value
runif(1, max=c(10,20,30)
that would be the same as
runfi(1, max=10)
This is noted in the help page under the Value section
The numerical arguments other than n are recycled to the length of the result.

Per the documentation for this function (https://www.rdocumentation.org/packages/stats/versions/3.6.2/topics/Uniform), min takes on the value 0 unless explicitly passed.
And yes, that is correct - the function will iterate over the values given in c() for each value. If there isn't a value passed (e.g. you're generating 3 random variables and set c=(1,2)), then max will take the default value of 1 for the elements that don't have a set max value. An example showing how it iterates over c():
x<-runif(3,max=c(1,20, 7000000))
x
[1] 0.622216 7.463306 809194.417205

Related

How to save values in Vector using R

I am supposed to find the mean and standard deviation at each given sample size (N), using the "FOR LOOP". I started writing the code as below, I am required to save all the means into vector "p". How do I save all the means into one vector?
sample.sizes =c(3,10,50,100,500,1000)
mean.sds = numeric(0)
for ( N in sample.sizes ){
x <- rnorm(3,mean=0,sd=1)
mean.sds[i]
}
mean(x)
Actually you are doing many thing wrong?
If you are using variable N in for loop, you are not using it anywhere
for (N in 'some_vector') actually means N will take that value one by one. So N in sample sizes will first take, 3 then 10 then 50 and so on.
Now where does i come into picture?
You are calculating x for each iteration of N. In fact you are not using N anywhere in the loop?
first x will return 3 values. In the next line you intend to store these three values in just ith value of mean.sds where i is unknown and storing three values into one value, as it is, is not logically possible.
Do you want this?
sample.sizes =c(3,10,50,100,500,1000)
mean.sds = numeric(0)
for ( i in seq_along(sample.sizes )){
x <- rnorm(sample.sizes[i], mean=0, sd=1)
mean.sds[i] <- mean(x)
}
mean.sds
[1] 0.6085489531 -0.1547286299 0.0052106559 -0.0452804986 -0.0374094936 0.0005667246
I replaced N with seq_along(sample.sizes) which will give iterations equal to the number of that vector. Six in this example.
I passed each ith element to first argument of rnorm to generate these many random values.
Stored each random value into single vector. calculated its mean (one value only) and stored in ith value of your empty vector.

get location of row with median value in R data frame

I am a bit stuck with this basic problem, but I cannot find a solution.
I have two data frames (dummies below):
x<- data.frame("Col1"=c(1,2,3,4), "Col2"=c(3,3,6,3))
y<- data.frame("ColA"=c(0,0,9,4), "ColB"=c(5,3,20,3))
I need to use the location of the median value of one column in df x to then retrieve a value from df y. For this, I am trying to get the row number of the median value in e.g. x$Col1 to then retrieve the value using something like y[,"ColB"][row.number]
is there an elegant way/function for doing this? Solutions might need to account for two cases - when the sample has an even number of values, and ehwn this is uneven (when numbers are even, the median value might be one that is not found in the sample as a result of calculating the mean of the two values in the middle)
The problem is a little underspecified.
What should happen when the median isn't in the data?
What should happen if the median appears in the data multiple times?
Here's a solution which takes the (absolute) difference between each value and the median, then returns the index of the first row for which that difference vector achieves its minimum.
with(x, which.min(abs(Col1 - median(Col1))))
# [1] 2
The quantile function with type = 1 (i.e. no averaging) may also be of interest, depending on your desired behavior. It returns the lower of the two "sides" of the median, while the which.min method above can depend on the ordering of your data.
quantile(x$Col1, .5, type = 1)
# 50%
# 2
An option using quantile is
with(x, which(Col1 == quantile(Col1, .5, type = 1)))
# [1] 2
This could possibly return multiple row-numbers.
Edit:
If you want it to only return the first match, you could modify it as shown below
with(x, which.min(Col1 != quantile(Col1, .5, type = 1)))
Here, something like y$ColB[which(x$Col1 == round(median(x$Col1)))] would do the trick.
The problem is x has an even number of rows, so the median 2.5 is not an integer. In this case you have to choose between 2 or 3.
Note: The above works for your example, not for general cases (e.g. c(-2L,2L) or with rational numbers). For the more general case see #IceCreamToucan's solution.

R Compare each data value of a column to rest of the values in the column?

I would like to create a function that looks at a column of values. from those values look at each value individually, and asses which of the other data points value is closest to that data point.
I'm guessing it could be done by checking the length of the data frame, making a list of the respective length in steps of 1. Then use that list to reference which cell is being analysed against the rest of the column. though I don't know how to implement that.
eg.
data:
20
17
29
33
1) is closest to 2)
2) is closest to 1)
3) is closest to 4)
4) is closest to 3)
I found this example which tests for similarity but id like to know what letter is assigns to.
x=c(1:100)
your.number=5.43
which(abs(x-your.number)==min(abs(x-your.number)))
Also if you know how I could do this, could you expain the parts of the code and what they mean?
I wrote a quick function that does the same thing as the code you provided.
The code you provided takes the absolute value of the difference between your number and each value in the vector, and compares that the minimum value from that vector. This is the same as the which.min function that I use below. I go through my steps below. Hope this helps.
Make up some data
a = 1:100
yourNumber = 6
Where Num is your number, and x is a vector
getClosest=function(x, Num){
return(which.min(abs(x-Num)))
}
Then if you run this command, it should return the index for the value of the vector that corresponds to the closest value to your specified number.
getClosest(x=a, Num=yourNumber)

Difference between mean(c(1,2,21)) and mean(1,2,21)

What's the difference between these two?
mean(c(1,2,21))
and
mean(1,2,21)
The answers are different, but what's the meaning of each one?
mean(c(1,2,21))
#[1] 8
This passes a vector of three elements to the mean function and the mean value of these three elements is calculated.
mean(1,2,21)
#[1] 1
This passes 1 as the first argument, 2 as the second argument and 21 as the third argument to the mean function. mean passes these arguments to mean.default. In help("mean.default") you can find the arguments of this function:
The object you want the mean for.
the fraction (0 to 0.5) of observations to be trimmed from each end of x before the mean is computed. Values of trim outside that range are taken as the nearest endpoint.
a logical value indicating whether NA values should be stripped before the computation proceeds. (Since you pass a numeric value, it is coerced to logical automatically).
So you calculate this:
mean.default(1, 0.5, TRUE)
[1] 1
When using mean(c(1,2,21)) R is taking the mean out of the vector consisting of 1,2 and 21, in the second case, when using mean(1,2,21), is equivalent to mean(1, trim=2, na.rm=21) and R is taking the mean out one single number, 1, and you are passing value 2 to trim which controls for the fraction (0 to 0.5) of observations to be trimmed from each end of the vector before the mean is computed, and also you are giving value 21 to na.rm argument, which should be TRUE or FALSE, as you can see 2 and 21 without c are completely useless here.

What does rnorm in R return when the sd argument contains a vector?

What does the following code do:
rnorm(10, mean=2, sd=1:10)
The first number is from N(2,1)
The second number if from N(2,2)
The third number is from N(2,3)
etc...?
The first argument tells R how many random variates you want returned. In this case, it will give you back 10 values. Those values will be drawn from normal distributions with mean equal to 2. In addition, all 10 values will be drawn from distributions with different standard deviations, the first with SD=1, the second 2, ..., the 10th SD=10. Perhaps the thing to understand is that R, by its nature, is vectorized. That is, there is no such thing as a scalar, only a vector of length=1. (I recognize that that doesn't make a lot of sense within pure math, but it does in computer science.) As a result, arguments are often 'recycled' so that they will all match the length of the longest vector, i.e., you end up with a vector of 10 means, each equal to 2, to match your vector of 10 SDs. HTH.

Resources