How to calculate 95% confidence interval for a proportion in R? - r

Assume I own a factory that produces 150 screws a day and there is a 22% error rate. Now I am going to estimate how many screws are faulty each day for a year (365 days) with
rbinom(n = 365, size = 150, prob = 0.22)
which generates 365 values in this way
45 31 35 31 34 37 33 41 37 37 26 32 37 38 39 35 44 36 25 27 32 25 30 33 25 37 36 31 32 32 43 42 32 33 33 38 26 24 ...................
Now for each of the value generated, I am supposed to calculate a 95% confidence interval for the proportion of faulty screws in each day.
I am not sure how I can do this. Is there any built in functions for this (I am not supposed to use any packages) or should I create a new function?

If the number of trials per day is large enough and the probability of failure not too extreme, then you can use the normal approximation https://en.wikipedia.org/wiki/Binomial_proportion_confidence_interval.
# number of failures, for each of the 365 days
f <- rbinom(365, size = 150, prob = 0.22)
# failure rates
p <- f/150
# confidence interval for the failur rate, for each day
p + 1.96*sqrt((p*(1-p)/150))
p - 1.96*sqrt((p*(1-p)/150))

Related

Generate random and unique probabilities based on the data in a column rstudio

I have a database with different ages (16-85) I'm trying to assign a random probability to each age but following the next conditions:
The random probability should be between 20-100
The probability should be higher in the lowest and highest values
For example:
I have this data frame
Age
Probability
16
85
18
80
25
70
30
50
35
40
65
60
75
65
85
70
I'd really appreciate if you can help me with this.
Tank you so much!

fit a normal distribution to grouped data, giving expected frequencies

I have a frequency distribution of observations, grouped into counts within class intervals.
I want to fit a normal (or other continuous) distribution, and find the expected frequencies in each interval according to that distribution.
For example, suppose the following, where I want to calculate another column, expected giving the
expected number of soldiers with chest circumferences in the interval given by chest, where these
are assumed to be centered on the nominal value. E.g., 35 = 34.5 <= y < 35.5. One analysis I've seen gives the expected frequency in this cell as 72.5 vs. the observed 81.
> data(ChestSizes, package="HistData")
>
> ChestSizes
chest count
1 33 3
2 34 18
3 35 81
4 36 185
5 37 420
6 38 749
7 39 1073
8 40 1079
9 41 934
10 42 658
11 43 370
12 44 92
13 45 50
14 46 21
15 47 4
16 48 1
>
> # ungroup to a vector of values
> chests <- vcdExtra::expand.dft(ChestSizes, freq="count")
There are quite a number of variations of this question, most of which relate to plotting the normal density on top of a histogram, scaled to represent counts not density. But none explicitly show the calculation of the expected frequencies. One close question is R: add normal fits to grouped histograms in ggplot2
I can perfectly well do the standard plot (below), but for other things, like a Chi-square test or a vcd::rootogram plot, I need the expected frequencies in the same class intervals.
> bw <- 1
n_obs <- nrow(chests)
xbar <- mean(chests$chest)
std <- sd(chests$chest)
plt <-
ggplot(chests, aes(chest)) +
geom_histogram(color="black", fill="lightblue", binwidth = bw) +
stat_function(fun = function(x)
dnorm(x, mean = xbar, sd = std) * bw * n_obs,
color = "darkred", size = 1)
plt
here is how you could calculate the expected frequencies for each group assuming Normality.
xbar <- with(ChestSizes, weighted.mean(chest, count))
sdx <- with(ChestSizes, sd(rep(chest, count)))
transform(ChestSizes, Expected = diff(pnorm(c(32, chest) + .5, xbar, sdx)) * sum(count))
chest count Expected
1 33 3 4.7600583
2 34 18 20.8822328
3 35 81 72.5129162
4 36 185 199.3338028
5 37 420 433.8292832
6 38 749 747.5926687
7 39 1073 1020.1058521
8 40 1079 1102.2356155
9 41 934 943.0970605
10 42 658 638.9745241
11 43 370 342.7971793
12 44 92 145.6089948
13 45 50 48.9662992
14 46 21 13.0351612
15 47 4 2.7465640
16 48 1 0.4579888

evaluating neural network performance

I trained my neural network with a sigmoid activation function so that the predicted values lie in the range [0,1). However, the range of real data in which the z-score transformation has been performed goes beyond [0,1). In this case what would be the appropriate way to evaluate my model. Should I rescale as well the original test data to the same range and then evaluate with criteria like mean square forecast error?
> real_predicted_neural
predicted real
1 1.909219e-07 -3.57877473
2 4.161819e-08 -2.28704595
3 1.754706e-11 -1.08509429
4 1.149891e-13 -0.46573114
5 7.777560e-02 0.42381300
6 4.173448e-07 -0.44060297
7 1.119703e-01 0.21075550
8 8.682557e-01 -0.01292402
9 4.736056e-08 -0.29830701
10 7.506821e-08 -1.20302227
11 7.341235e-01 -0.03986571
12 7.501776e-05 -0.94315815
13 1.145697e-04 0.49730175
14 2.214929e-13 0.04252241
15 4.597199e-01 -0.38539901
16 2.324931e-03 -0.74468628
17 4.366025e-06 -0.77037244
18 1.394450e-06 0.16679048
19 5.869884e-11 -0.75876486
20 1.817941e-04 0.04303387
21 7.060773e-04 0.06099372
22 8.267170e-06 -1.21687318
23 9.388680e-02 0.61135319
24 1.099290e-01 0.55715201
25 9.757236e-01 -0.33480226
26 9.544055e-01 0.09061006
27 7.322074e-07 0.09290822
28 1.014327e-06 -0.61658893
29 7.848382e-08 -0.78739456
30 1.791908e-04 -0.44073540
31 1.357918e-03 -0.22099008
32 5.192233e-06 -0.32744703
33 2.624779e-06 -0.37644068
34 6.414216e-02 -0.36947939
35 1.388143e-06 -0.00994845
36 3.010872e-05 -0.05984833
37 9.873201e-03 -0.21815268
38 3.896163e-04 -0.24009094
39 2.718760e-02 0.33383333
40 1.025650e-02 0.09779867

How to draw boxplot by creating age buckets from integer age entries

I have a data frame in R, an example of which is given below:
age number_of_visits
19 10
50 24
25 50
24 35
31 19
42 26
55 40
64 15
20 35
67 20
69 18
33 15
28 50
62 18
I need to create age bins like 18 to 24, 25 to 39, 40 to 54, 55 to 65, and above 65, and then for each of these age bins I need to create boxplots for number of visits.
It would be helpful, if any one can provide code to be used in Rstudio
Thank you!
You could do this with cut2 from the Hmisc package:
library(Hmisc)
# Toy Data
age <- rnorm(100, mean=45, sd=15)
number_of_visits <- rnorm(100, mean=20, sd=10)
# cut2 lets you set custom cutpoints
interval <- cut2(age, c(18,25,40,55,65))
boxplot(number_of_visits ~ interval)

How to make vector pick 3 possible values depending on last value in R

In R I am trying to create a vector of 12 numbers. The first number is 40 from 40 it can either go down by 1 with probability 0.5, stay the same with probability 0.2 or go up by 1 with probability 0.3. The next value depends on the last value. For example, a possible vector could be:
40 39 38 37 37 38 37 36 37 38 39 39
I have tried several different methods but I am unable to get any to work. My latest attempt was:
xx=c()
num <- 60
for (i in 1:12){
xx[i] <- sample(x=c(num,num+5,num+10),size=1,prob = probs)}
Thanks for your help.
Set your initial value and add the difference based on the probabilities you specified:
x <- 40
for (i in 2:12){
x[i] <- x[i-1]+sample(c(-1, 0, 1), size=1, prob=c(.5, .2, .3))
}
x
[1] 40 41 41 40 41 40 39 39 40 39 38 38

Resources