Programming include inequality (R) - r

These is what i did, given h = 5 and get the value of ARL=2.5.
C=[1,5,8,9,7,8,1,2,5,5]
h=5
n=10
ooc=C>h
lenOOC = sum(ooc)
lenOOC
>4
p=lenOOC/n
p
>0.4
ARL=1/p
ARL
>2.5
But now i need is given the value of ARL and compute the value of h=?
for example, with same data set of C.
ARL=1.428571
n=10
lenOOC=n/ARL
lenOOC
>7...
and how to proceed to get the value of h? given ooc=C>h.
Hope you all can understand what im trying to ask..
Thanks for helping..=]]

lenOOC=n/ARL You want h so that lenOOC = sum(C>h). that is you want to find a number h so that lenOOC of the numbers are strictly bigger than h. That is the same as n - lenOOC are less than or equal to h so you want the n - lenOOC biggest number. Just sort the numbers in C and pick the n - lenOOC biggest number.
C=c(1,5,8,9,7,8,1,2,5,5)
ARL=1.428571
n=10
lenOOC=n/ARL
sort(C)[n - round(lenOOC)]
[1] 2
So h = 2.
But notice that this will not work for all values of ARL. What if we tried ARL = 10/9 = 1.1111111
This would give
ARL=1.111111
lenOOC=n/ARL
sort(C)[n - round(lenOOC)]
[1] 1
But h=1 would actually give ARL = 1.25 No number will give ARL=1.111111. The above method only works for possible values of ARL.

Related

Why am I getting NAs in this calculation in R?

While working on an Rcpp program, I used the sample() function, which gave me the following error: "NAs not allowed in probability." I traced this issue to the fact that the probability vector I used had NA values in it. I have no idea how. Below is some R code that captures the errors:
n.0=20
n.1=20
n.reps=1
beta0.vals=rep(seq(-.3,.1,,n.0),n.reps)
beta1.vals=rep(seq(-7,0,,n.1),n.reps)
beta.grd=as.matrix(expand.grid(beta0.vals,beta1.vals))
n.rnd=200
beta.rnd.grd=cbind(runif(n.rnd,min(beta0.vals),max(beta0.vals)),runif(n.rnd,min(beta1.vals),max(beta1.vals)))
beta.grd=rbind(beta.grd,beta.rnd.grd)
N = 22670
count = 0
for(i in 1:dim(beta.grd)[1]){ # iterate through 600 possible beta values in beta grid
beta.ind = 0 # indicator for current pair of beta values
for(j in 1:N){ # iterate through all possible Nsums
logit = beta.grd[i,1]/N*(j - .1*N)^2 + beta.grd[i,2];
phi01 = exp(logit)/(1 + exp(logit))
if(is.na(phi01)){
count = count + 1
}
}
}
cat("Total number of invalid probabilities: ", count)
Here, $\beta_0 \in (-0.3, 0.1), \beta_1 \in (-7, 0), N = 22670, N_\text{sum} \in (1, N)$. Note that $N$ and $N_\text{sum}$ are integers, whereas the beta values may not be.
Since mathematically, $\phi_{01} \in (0,1)$, I'm assuming that NAs are arising because R is not liking extremely small values. I am receiving an overwhelming amount of NA values, too. More so than numbers. Why would I be getting NAs in this code?
Include print(logit) next to count = count + 1 and you will find lots of logit > 1000 values. exp(1000) == Inf so you divide Inf by Inf which will get you a NaN and NaN is NA:
> exp(500)
[1] 1.403592e+217
> Inf/Inf
[1] NaN
> is.na(NaN)
[1] TRUE
So your problems are not too small but to large numbers coming first out of the evaluation of exp(x) with x larger then roughly 700:
> exp(709)
[1] 8.218407e+307
> exp(710)
[1] Inf
Bernhard's answer correctly identifies the problem:
If logit is large, exp(logit) = Inf.
Here is a solution:
for(i in 1:dim(beta.grd)[1]){ # iterate through 600 possible beta values in beta grid
beta.ind = 0 # indicator for current pair of beta values
for(j in 1:N){ # iterate through all possible Nsums
logit = beta.grd[i,1]/N*(j - .1*N)^2 + beta.grd[i,2];
## This one isn't great because exp(logit) can be very large
# phi01 = exp(logit)/(1 + exp(logit))
## So, we say instead
## phi01 = 1 / ( 1 + exp(-logit) )
phi01 = plogis(logit)
if(is.na(phi01)){
count = count + 1
}
}
}
cat("Total number of invalid probabilities: ", count)
# Total number of invalid probabilities: 0
We can use the more stable 1 / (1 + exp(-logit)
(to convince yourself of this, multiply your expression with exp(-logit) / exp(-logit)),
and luckily either way, R has a builtin function plogis() that can calculate these probabilities quickly and accurately.
You can see from the help file (?plogis) that this function evaluates the expression I gave, but you can also double check to assure yourself
x = rnorm(1000)
y = 1 / (1 + exp(-x))
z = plogis(x)
all.equal(y, z)
[1] TRUE

Summatory vector elements

I have a named numb vector of probabilities, like this
Vector elements
Like you can see, the sum of this vector elements it's 1, I have to generate a random number between 0 and 1 and get the element of this vector that don't overcome this random number, for example:
The random number generate: 0.01
I will get the water element because water it's between 0.09 and 0.11. I attach an graphic example
Example
I don't know how to get the element of this probability.
I am not going to type in all of your data, so I will use a comparable, small example:
set.seed(2017)
probabilidad = runif(5)
probabilidad= probabilidad/sum(probabilidad)
names(probabilidad) = LETTERS[1:5]
probabilidad
A B C D E
0.30918062 0.17969799 0.15695684 0.09655216 0.25761238
sum(probabilidad)
[1] 1
We can use cumsum to set up a vector to make the choices you want. But cumsum will give the upper bounds for the regions and we want the lower bounds, so we adjust the output a little.
Test = c(0, cumsum(probabilidad)[-length(probabilidad)])
names(Test) = names(probabilidad)
Test
A B C D E
0.0000000 0.2533938 0.5129561 0.5922145 0.8222417
Now you can easily test random numbers against the distribution.
(Selector = runif(1))
[1] 0.5190959
names(probabilidad)[max(which(Selector > Test))]
[1] "C"

Can't break out of while loop in R

The purpose of my code is to find the amount of people where the probability that at least 2 of them have the same birthday is 50%.
source('colMatches.r')
all_npeople = 1:300
days = 1:365
ntrials = 1000
sizematch = 2
N = length(all_npeople)
counter = 1
pmean = rep(0,N)
while (pmean[counter] <= 0.5)
{
npeople = all_npeople[counter]
x = matrix(sample(days, npeople*ntrials, replace=TRUE),nrow=npeople,
ncol=ntrials)
w = colMatches(x, sizematch)
pmean[counter] = mean(w)
counter = counter + 1
}
s3 = toString(pmean[counter])
s2 = toString(counter)
s1 = "The smallest value of n for which the probability of a match is at least 0.5 is equal to "
s4 = " (the test p value is "
s5 = "). This means when you have "
s6 = " people in a room the probability that two of them have the same birthday is 50%."
paste(s1, s2, s4, s3, s5, s2, s6, sep="")
When I run that code I get "The smallest value of n for which the probability of a match is at least 0.5 is equal to 301 (the test p value is NA). This means when you have 301 people in a room the probability that two of them have the same birthday is 50%." So the while statement isn't working properly for some reason. It's cycling all the way through all_npeople even though it should stop when pmean[counter] is no longer less than or equal to 0.5.
I know that pmean is updating correctly though because when I test it afterwards pmean[50] = 0.971. So that list is indeed correct but the while loop still won't end.
*colmatches is a function that determines if a column has a certain number of matches based on sizematch. So in this case it's looking at the matrix defined in x and listing 1 for every column that has at least 2 similar values and 0 for every column with no matches.
I admire your attempt to program this question, but the beauty of R is most of this work is done for you:
qbirthday(prob = 0.5, classes = 365, coincident = 2)
#answer is 23 people.
You maybe also be interested in:
pbirthday(n, classes = 365, coincident = 2)
If the purpose of the code is only to define number of people when probability that at least two of them have same birthday is above 0.5, it is possible to write it in much simplier way:
# note that probability below is probability of NOT having same birthday
probability <- 1
people <- 1
days <- 365
while(probability >= 0.5){
people <- people + 1
probability <- probability * (days + 1 - people) / days
}
print(people)

R rounding off numbers like 8.829847e-07

How can I round off a number like 0.0000234889 (or in the form 8.829847e-07) to a power of ten, either below or above (whichever is my choice), ie here 0.00001 or 0.0001 ?
I tried round(...., digits=-100000) but it returns an error NaN error.
Ex: round(2e-07, digits=6) gives 0, while I would like 1e-06 and another function to give 1e-07.
# Is this what you're looking for?
# find the nearest power of ten for some number
x <- 0.0000234889 # Set test input value
y <- log10(x) # What is the fractional base ten logarithm?
yy <- round(y) # What is the nearest whole number base ten log?
xx <- 10 ^ yy # What integer power of ten is nearest the input?
print(xx)
# [1] 1e-05
The digits argument to the round() function must be positive. If you want your number to show up in scientific notation with an exponent n, just just do
round(value, 10^n)
However, this will only get you what you want up to a point. For example, you can do round(0.0000234889, 10^6) but you still get 2.34889e-05. (Notice that an exponent of 6 was specified but you got 5.)
Use options("scipen" = ) like this:
num <- 0.0000234889
> num
[1] 2.34889e-05
options("scipen" = 10)
options()$scipen
> num
[1] 0.0000234889
This will change the global option for the session. Read documentation here:https://stat.ethz.ch/R-manual/R-devel/library/base/html/options.html

Combinations of characters (Math)

If I have 37 different characters and I need to create all possible words from that having max length of 15, what will be number pf total words?
Supoose, I have X=2 characters, and Y=2 max length, then possible outcomes are:
A
AA
B
BB
AB
BA
So Z=6 number of total outomces.
Now if value of X=37 and Y=15, what will be value of Z???
It's sum(37^i) for i=1..15. For your example: 2^1 + 2^2 = 6.
In Addition to Bartosz' answer, a simple formula for the sum is
n = ((1-X^(Y+1)) / (1-X)) - 1
Another simple formula for the summation is
x * (x^y - 1)
-------------
x - 1
http://www.wolframalpha.com/input/?i=sum+x%5Ei+for+i+1+to+y+

Resources