storing output of pbinom (with multiple probabilities of success) in R - r

I am using pbinom in R to determine the p-values for multiple outcome values and probabilities of success:
1 - pbinom(1:2, 21, c(0.02, 0.05))
1:2 represents the number of observed counts, 21 represents the sample size, and 0.02 and 0.05 represent the probability of success. However, the output of the above command is:
[1] 0.06534884 0.08491751
These values represent the probabilities of:
1 - pbinom(1, 21, 0.02) & 1 - pbinom(2, 21, 0.05)
respectively.
I wish to obtain the outputs of : 1 - pbinom(1:2, 21, 0.02) and 1 - pbinom(1:2, 21, 0.05)
such that I obtain the output:
[1] 0.065348840 0.008125299 ## pvalues for 1 - pbinom(1:2, 21, 0.02)
[1] 0.28302816 0.08491751 ## pvalues for 1 - pbinom(1:2, 21, 0.05)
My actual data set is very lengthy, so I can't type code for every probability of success.
I also tried this using a for loop:
output=c()
for (i in 1:2) {
output[i]=(1 - pbinom(i, 21, c(0.02, 0.05)))
}
But I get the following warning message:
1: In output[i] = (1 - pbinom(i, 21, c(0.02, 0.05))) :
number of items to replace is not a multiple of replacement length
2: In output[i] = (1 - pbinom(i, 21, c(0.02, 0.05))) :
number of items to replace is not a multiple of replacement length
I realize this question maybe difficult to interpret, but any help will be greatly appreciated.
Thank you.

Using sapply:
t(sapply( c(0.02, 0.05), function(x) 1 - pbinom(1:2, 21, x)))
# [,1] [,2]
# [1,] 0.06534884 0.008125299
# [2,] 0.28302816 0.084917514

Hi you can try this,
matrix(1-pbinom(c(1:2,1:2), size=21, prob = rep(c(0.02,0.05), each=2)), ncol=2, byrow=TRUE)
PS : The error means that your vector is not of the same length of your input.

Related

How to find F value that corresponds to a specific p value, e.g. p=0.05 in R?

I want to find the F value (of a one(right)-tailed distribution) that will correspond to p=0.05 given pre-specified two degrees of freedom (1 and 29 below). I do this by trial and error:
#..F values..............p values...
1-pf(4.75, 1, 29) # 0.03756451
1-pf(4.15, 1, 29) # 0.05085273
1-pf(4.18295, 1, 29) # 0.05000037
1-pf(4.18297, 1, 29) # 0.04999985
1-pf(4.18296, 1, 29) # 0.05000011
So, I want to obtain F=4.18296 without trial and error. Any idea?
There are two possibilities to achieve such result, we need to use the quantile function:
qf(1 - 0.05, 1, 29) or qf(0.05, 1, 29, lower.tail = FALSE)
qf(1 - 0.05, 1, 29)
# [1] 4.182964
qf(0.05, 1, 29, lower.tail = FALSE)
# [1] 4.182964
1 - pf(4.182964, 1, 29)
# [1] 0.05000001
The first option takes into account that the default option of lower.tail is equal to TRUE so we have to use 1 - 0.05
For the second option, we specify that we want P[X > x] using lower.tail = FALSE

Calculate quantiles in R without interpolation - round up or down to actual value

It's my understanding that when calculating quantiles in R, the entire dataset is scanned and the value for each quantile is determined.
If you ask for .8, for example it will give you a value that would occur at that quantile. Even if no such value exists, R will nonetheless give you the value that would have occurred at that quantile. It does this through linear interpolation.
However, what if one wishes to calculate quantiles and then proceed to round up/down to the nearest actual value?
For example, if the quantile at .80 gives a value of 53, when the real dataset only has a 50 and a 54, then how could one get R to list either of these values?
Try this:
#dummy data
x <- c(1,1,1,1,10,20,30,30,40,50,55,70,80)
#get quantile at 0.8
q <- quantile(x, 0.8)
q
# 80%
# 53
#closest match - "round up"
min(x[ x >= q ])
#[1] 55
#closest match - "round down"
max(x[ x <= q ])
#[1] 50
There are many estimation methods implemented in R's quantile function. You can choose which type to use with the type argument as documented in https://stat.ethz.ch/R-manual/R-devel/library/stats/html/quantile.html.
x <- c(1, 1, 1, 1, 10, 20, 30, 30, 40, 50, 55, 70, 80)
quantile(x, c(.8)) # default, type = 7
# 80%
# 53
quantile(x, c(.8), FALSE, TRUE, 7) # equivalent to the previous invocation
# 80%
# 53
quantile(x, c(.8), FALSE, TRUE, 3) # type = 3, nearest sample
# 80%
# 50

view values used by function boot to bootstrap estimates

I have written the code below to obtain a bootstrap estimate of a mean. My objective is to view the numbers selected from the data set, ideally in the order they are selected, by the function boot in the boot package.
The data set only contains three numbers: 1, 10, and 100 and I am only using two bootstrap samples.
The estimated mean is 23.5 and the R code below indicates that the six numbers included one '1', four '10' and one '100'. However, there are 30 possible combinations of those numbers that would have resulted in a mean of 23.5.
Is there a way for me to determine which of those 30 possible combinations is the combination that actually appeared in the two bootstrap samples?
library(boot)
set.seed(1234)
dat <- c(1, 10, 100)
av <- function(dat, i) { sum(dat[i])/length(dat[i]) }
av.boot <- boot(dat, av, R = 2)
av.boot
#
# ORDINARY NONPARAMETRIC BOOTSTRAP
#
#
# Call:
# boot(data = dat, statistic = av, R = 2)
#
#
# Bootstrap Statistics :
# original bias std. error
# t1* 37 -13.5 19.09188
#
mean(dat) + -13.5
# [1] 23.5
# The two samples must have contained one '1', four '10' and one '100',
# but there are 30 possibilities.
# Which of these 30 possible sequences actual occurred?
# This code shows there must have been one '1', four '10' and one '100'
# and shows the 30 possible combinations
my.combos <- expand.grid(V1 = c(1, 10, 100),
V2 = c(1, 10, 100),
V3 = c(1, 10, 100),
V4 = c(1, 10, 100),
V5 = c(1, 10, 100),
V6 = c(1, 10, 100))
my.means <- apply(my.combos, 1, function(x) {( (x[1] + x[2] + x[3])/3 + (x[4] + x[5] + x[6])/3 ) / 2 })
possible.samples <- my.combos[my.means == 23.5,]
dim(possible.samples)
n.1 <- rowSums(possible.samples == 1)
n.10 <- rowSums(possible.samples == 10)
n.100 <- rowSums(possible.samples == 100)
n.1[1]
n.10[1]
n.100[1]
length(unique(n.1)) == 1
length(unique(n.10)) == 1
length(unique(n.100)) == 1
I think you can determine the numbers sampled and the order in which they are sampled with the code below. You have to extract the function ordinary.array from the boot package and paste that function into your R code. Then specify the values for n, R and strata, where n is the number of observations in the data set and R is the number of replicate samples you want.
I do not know how general this approach is, but it worked with a couple of simple examples I tried, including the example below.
library(boot)
set.seed(1234)
dat <- c(1, 10, 100, 1000)
av <- function(dat, i) { sum(dat[i])/length(dat[i]) }
av.boot <- boot(dat, av, R = 3)
av.boot
#
# ORDINARY NONPARAMETRIC BOOTSTRAP
#
#
# Call:
# boot(data = dat, statistic = av, R = 3)
#
#
# Bootstrap Statistics :
# original bias std. error
# t1* 277.75 -127.5 132.2405
#
#
mean(dat) + -127.5
# [1] 150.25
# boot:::ordinary.array
ordinary.array <- function (n, R, strata)
{
inds <- as.integer(names(table(strata)))
if (length(inds) == 1L) {
output <- sample.int(n, n * R, replace = TRUE)
dim(output) <- c(R, n)
}
else {
output <- matrix(as.integer(0L), R, n)
for (is in inds) {
gp <- seq_len(n)[strata == is]
output[, gp] <- if (length(gp) == 1)
rep(gp, R)
else bsample(gp, R * length(gp))
}
}
output
}
# I think the function ordinary.array determines which elements
# of the data are sampled in each of the R samples
set.seed(1234)
ordinary.array(n=4,R=3,1)
# [,1] [,2] [,3] [,4]
# [1,] 1 3 1 3
# [2,] 3 4 1 3
# [3,] 3 3 3 3
#
# which equals:
((1+100+1+100) / 4 + (100+1000+1+100) / 4 + (100+100+100+100) / 4) / 3
# [1] 150.25

rounding in R cut function

Does anyone know how R chooses the number of significant digits in the cut function?
y<-c(61, 64, 64, 65, 66)
table(cut(y, breaks=c(60.555, 67.123, 75.055)))
produces the result
(60.6,67.1] (67.1,75.1]
5 0
but
table(cut(y, breaks=c(60.958, 67.958, 74.958)))
produces the result
(61,68] (68,75]
5 0
I would prefer that r use the exact boundaries that I provide in the cut function, but it seems to be rounding. I'm not clear on how it chooses the precision of the rounding. See the example below. Is it possible to force R to use my exact boundaries?
How about using nchar to find the number of digits per cut? Here are three examples.
> y <- c(61, 64, 64, 65, 66)
> breaks1 <- c(60.555, 67.123, 75.055)
> table(cut(y, breaks = breaks1, dig.lab = min(nchar(breaks1))))
## (60.555,67.123] (67.123,75.055]
## 5 0
> breaks2 <- c(60.5, 67.1, 75.4)
> table(cut(y, breaks = breaks2, dig.lab = min(nchar(breaks2))))
## (60.5,67.1] (67.1,75.4]
## 5 0
> breaks3 <- c(60, 67, 75)
> table(cut(y, breaks = breaks3, dig.lab = min(nchar(breaks3))))
## (60,67] (67,75]
## 5 0
NOTE that the use of min is just to control for warning messages that would occur should the digits not be identical in the breaks vector.

Plot randomness

I am looking for help with generating this plot from a sequence of ones and zeros, in R. I am using it as one of a battery of tests to investigate whether a sequence is random or not (by looking for patterns in the noise).
Note: This is not homework!
E.g.,
> y <- rnorm(3000, 1, 2)
> plot(y)
>plot(y~y)
My data is in this form:
>str(hv10k)
num [1:100000] 0 1 1 1 0 0 1 0 0 0 ...
Update:
Following #Roman Luštrik suggestions this is what I've got so far:
700 approx coin toss:
100,000 coin toss:
One way would be
side <- 100
my.zero <- matrix(sample(c(0,1), side^2, replace = TRUE), side)
image(my.zero)
EDIT
You can play with the prob argument in sample.
side <- 100
my.zero <- matrix(sample(c(0,1), side^2, replace = TRUE, prob = c(0.8, 2)), side)
image(my.zero)
EDIT 2
y <- rnorm(10000, 1, 2)
y <- matrix(ifelse(y > 0, 1, 0), ncol = 100)
image(y, col = c("white", "black"))

Resources