confused in understanding an rbinom output - r

p1 <- c(.25,.025,.025,.1,.2,.4)
T <- sample(1:6,size=N,replace=TRUE, prob=someprobabilityvector)
Y <- rbinom(N,1,p1[c(T)])
HI folks, I am new to R and programming in general and need some help with understanding sth basic. could someone explain to me one what is happening in vector Y above. I figure out what p1[c(T)] does above. But have no idea what vector Y is doing. All help is appreciated in advance.

The first line of your code creates a vector of six probabilities:
p1 <- c(.25,.025,.025,.1,.2,.4)
In the second line, you randomly choose N values from the numbers one to six (with replacement). The probability for each value is specified in someprobabilityvector. Hence, the function will return a vector of length N including values between 1 and 6
T <- sample(1:6,size=N,replace=TRUE, prob=someprobabilityvector)
In the third line, N random numbers from a binomial distribution with one trial and probablities specified in p1[c(T)] are generated. c(T) is the same as T: the vector including values from 1 to 6. The vector is used for indexing the vector p1. Hence, p1[c(T)] will return a vector including N values from vector p1.
Y <- rbinom(N,1,p1[c(T)])
Since the specified binomial distribution has one trial only, the vector Y will contain zeroes and ones.

Related

Number from sample to be drawn from a Poisson distribution with upper/lower bounds

Working in R, I need to create a vector of length n with the values randomly drawn from a Poisson distribution with lambda=1, but with a lower bound of 2 and upper bound of 6 (i.e. all numbers will be either 2,3,4,5, or 6).
I am unsure how to do this. I tried creating a for loop that would replace any values outside that range with values inside the range:
seed(123)
n<-25 #example length
example<-rpois(n,1)
test<-example #redundant - only duplicating to compare with original *example* values
for (i in 1:length(n)){
if (test[i]<2||test[i]>6){
test[i]<-rpois(1,1)
}
}
But this didn't seem to work (still getting 0's and 1, etc, in test). Any ideas would be greatly appreciated!
Here is one way to generate n numbers with Poisson distribution and replace all the numbers which are outside range to random number inside the range.
n<-25 #example length
example<-rpois(n,1)
inds <- example < 2 | example > 6
example[inds] <- sample(2:6, sum(inds), replace = TRUE)

R Repeat function from first subvector of vector until total vector length reached

I have a vector epsilon of length N. I am applying the function bw.CDF.pi(x, pilot="UCV") from the sROC package to compute bandwidths for cdf Kernel estimation.
My goal is to repeat this bandwidth function for every subvector from epsilon from the beginning value on. Stated otherwise, I would like to apply this function for the first value in epsilon, then for the first two values in epsilon, then for the first three values in epsilon, continiuing until the function is applied fot the total vector epsilon. Finally i want to have then N values for the bandwidth.
How can I accomplish this?
Apparently you need a vector of 2 elements for the function bw.CDF.pi to run. If you want to run it for the first 2 elemts of a vector, then the first 3, etc, you can do the following. Note that the data example is the one in the help page for the function.
library(sROC)
set.seed(100)
n <- 200
x <- c(rnorm(n/2, mean=-2, sd=1), rnorm(n/2, mean=3, sd=0.8))
lapply(seq_along(x)[-1], function(m) bw.CDF.pi(x[seq_len(m)], pilot="UCV"))

R: Putting Variables in order by a different variable

Once again I have been set another programming task and to most of which I have done, so a quick run through: I've had to take n amount of samples of multivariate normal distribution with dimension p (called it X) then to put it into a matrix (Matx) where the first two values in each row were taken and summed a long with a value randomly drawn from the standard normal distribution. (Call this vector Y) Then we had to order Y numerically and split it up into H groups, and then I had to find out the mean of each row in the matrix and now having to order then in terms of which Y group they were associated. I've struggled a fair bit and have now hit a brick wall. Quite confusing I understand, if anyone could help it'd be greatly appreciated!
Task:Return the pxH matrix which has in the first column the mean of the observations in the first group and in the Hth column the mean in the observations in the Hth group.
Code:
library('MASS')
x<-mvrnorm(36,0,1)
Matx<-matrix(c(x), ncol=6, byrow=TRUE)
v<-rnorm(6)
y1<-sum(x[1:2],v[1])
y2<-sum(x[7:8],v[2])
y3<-sum(x[12:13],v[3])
y4<-sum(x[19:20],v[4])
y5<-sum(x[25:26],v[5])
y6<-sum(x[31:32],v[6])
y<-c(y1,y2,y3,y4,y5,y6)
out<-order(y)
h1<-c(out[1:2])
h2<-c(out[3:4])
h3<-c(out[5:6])
x1<-c(x[1:6])
x2<-c(x[7:12])
x3<-c(x[13:18])
x4<-c(x[19:24])
x5<-c(x[25:30])
x6<-c(x[31:36])
mx1<-mean(x1)
mx2<-mean(x2)
mx3<-mean(x3)
mx4<-mean(x4)
mx5<-mean(x5)
mx6<-mean(x6)
d<-c(mx1,mx2,mx3,mx4,mx5,mx6)[order(out)]
d

tapply, plotting, length doesn't match

I am trying to generate a plot from a dataset of 2 columns - the first column contains distances and the second contains correlations of something measured at those distances.
Now there multiple entries with the same distance but different correlation values. I want to take the average of these various entries and generate a plot of distance versus correlation. So, this is what I did (the dataset is called correlation table):
bins <- sort(unique(correlationtable[,1]))
corr <- tapply(correlationtable[,2],correlationtable[,1],mean)
plot(bins,corr,type = 'l')
However, this gives me the error that lengths of bins and corr don't match.
I cannot figure out what am I doing wrong.
I tried it with some random data and for me it worked every time. To track the error you would need to supply us with the concrete example that did not work for you.
However to answer the question here is alternative way to do the same thing:
corr <- tapply(correlationtable[,2],correlationtable[,1],mean)
bins <- as.numeric(names(corr))
plot(bins,corr,type = 'l')
This uses the fact that tapply returns names attribute which then is converted into numeric and used as distance. And it must be the same length as corr.

lchoose function in R

My understanding of lchoose function in R is simply lchoose(a,b) = log(choose(a,b)).
However, I found that:
temp <- 7.9999993
k <- 8
choose(temp,k)
[1] 0
lchoose(temp,k)
[1] 0
log(choose(temp,k))
[1] -Inf
So lchoose is not log of the choose function output.
Why is this happening?
In the discrete case (i.e discrete n), choose(n,k) computes the number of distinct k-element subsets from a set of n elements, so if k > n, then you are counting subsets of a set which have more elements that the corresponding set. Since there are no such subsets, then the answer is zero.
In general, for an n which is a real number, the function can still be computed, but however, the function still has to have the same meaning over discrete values, so for k>n the function has a value of zero. If you look at the definition of the binomial function with real n (see here) you'll see that the answer will be zero, but I tried to explain it, hopefully, in an intuitive manner.

Resources