Translate this R geometric problem using numpy random geometric - r

How can I translate this geometric law problem to numpy ?
Products produced by a machine has a 3% defective rate.
What is the probability that the first defective oc-curs in the fifth item inspected?
P(X= 5) =P(1st 4 non-defective )P( 5th defective)=(0.974)(0.03)
In R > dgeom (x= 4, prob = .03)[1] 0.02655878T
The convention in R is to record X as the number of failures that occur
before the first success.
Is this my numpy code ok ? :
result = np.random.geometric(p=0.03, size=1000)
print(result);
result = (result == 5).sum() / 1000.
print(result * 1000,"%");
I get 17 % as a result with numpy , is it ok ? Seem wrong because there is only 3% defect rate.
This is the numpy result Array :
""" [ 31 20 37 9 47 31 22 7 44 15 52 15 4 14 36 45 26 27
9 48 30 5 7 17 7 24 121 22 23 49 2 26 25 8 4 5
3 27 70 71 3 1 19 22 103 18 14 20 34 45 8 169 11 63
29 71 30 79 75 19 56 9 5 8 15 44 8 12 40 29 46 2
144 69 65 1 4 90 20 187 100 52 46 76 3 105 12 110 31 3
113 18 6 15 127 22 6 7 3 18 123 41 69 104 13 18 2 8
52 35 54 27 74 22 31 27 3 15 21 26 13 3 32 10 131 20
I guess that 31 is the number of integrity checks before a failure .... 20 , 37 etc ...

This is what I would do:
np.random.seed(1)
tests = np.random.choice([0,1], size=(1000,5), p=[0.7,0.3])
((np.argmax(tests, axis=1) == 4) & tests[:,4]==1).mean()
# 0.073

Related

removing outliers in a vector

The aim is to remove outliers in a vector.
x = datasets::islands ($area)
x = 12 13 13 13 14 14 15 15 16 16 16 19 21 23 25 26 29 29 30 30
32 33 36 40 42 43 43 44 49 58 73 82 82 84 89 183 184 227 280 306
840 2968 3745 5500 6795 9390 11506 16988
so far by using
x_rm_out <- x[!x%in% boxplot.stats
(x, coef = .05, do.conf = TRUE, do.out = TRUE)$out]
Result
[1] 12 13 13 13 14 14 15 15 16 16 16 19 21 23 25 26 29 29 30 30 32 33 36 40 42 43 43 44 49 58 73
[32] 82 82 84 89 183 184
Is there a way to remove 183 & 184 from vector (x)?
Finding Outliers
A very easy way to find outliers is with the rstatix package, then filter them out with dplyr:
# Load library:
library(rstatix)
library(dplyr)
# Make x into dataframe:
x <- data.frame(x)
# Identify outliers:
x %>%
identify_outliers()
You should get an output like this now:
x is.outlier is.extreme
1 840 TRUE TRUE
2 2968 TRUE TRUE
3 3745 TRUE TRUE
4 5500 TRUE TRUE
5 6795 TRUE TRUE
6 9390 TRUE TRUE
7 11506 TRUE TRUE
8 16988 TRUE TRUE
Creating Dataframe Without Them
Now you have to filter out the data, which you can then turn into a new dataframe (< 840). You may also remove them with your previously established criterion (< 183) if you desire:
# Filter outliers and create new file:
x2 <- x %>%
filter(x < 183)
x2
Which after you enter x2, gives you this output without outliers:
x
1 12
2 13
3 13
4 13
5 14
6 14
7 15
8 15
9 16
10 16
11 16
12 19
13 21
14 23
15 25
16 26
17 29
18 29
19 30
20 30
21 32
22 33
23 36
24 40
25 42
26 43
27 43
28 44
29 49
30 58
31 73
32 82
33 82
34 84
35 89
To supplement the Shawn's answer, you can also use rstatix::is_outlier() function for numeric vectors.

R plot numbers of factor levels having n, n+1, .... counts

I have a very large dataset (> 200000 lines) with 6 variables (only the first two shown)
>head(gt7)
ChromKey POS
1 2447 25
2 2447 183
3 26341 75
4 26341 2213
5 26341 2617
6 54011 1868
I have converted the Chromkey variable to a factor variable made up of > 55000 levels.
> gt7[1] <- lapply(gt7[1], factor)
> is.factor(gt7$ChromKey)
[1] TRUE
I can further make a table with counts of ChromKey levels
> table(gt7$ChromKey)
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
88 88 44 33 11 11 33 22 121 11 22 11 11 11 22 11 33
18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34
22 22 44 55 22 11 22 66 11 11 11 22 11 11 11 187 77
35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51
77 11 44 11 11 11 11 11 11 22 66 11 22 11 44 22 22
... outut cropped
Which I can save in table format
> table <- table(gt7$ChromKey)
> head(table)
1 2 3 4 5 6
88 88 44 33 11 11
I would like to know whether is it possible to have a table (and histogram) of the number of levels with specific count numbers. From the example above, I would expect
88 44 33 11
2 1 1 2
I would very much appreciate any hint.
We can apply table again on the output to get the frequency count of the frequency
table(table(gt7$ChromKey))

Keeping combinations that contain specific range of values

I want to keep the combination that contains 8 values from 1:30, 1 or 2 values from 31:60, and 3 values from 61:70,
and I have the following combinations :
15 6 10 26 7 27 19 51 54 61 64 69 70
# do not keep this b/c there are 4 values from 61:70
23 2 7 29 3 17 4 20 60 56 61 66 68 # keep this one
17 30 24 3 25 5 15 11 43 49 66 67 68 # keep this one
25 13 14 9 29 16 15 4 56 63 66 67 70
# do not keep this b/c there are 4 values from 61:70
14 24 3 17 11 15 27 25 31 59 62 65 69
20 28 8 24 1 18 25 3 44 45 69 61 70
... (32 in totals)
how can i do this ?
edit.
I am not sure how you want to "keep" the required combinations, but to find the combinations you are looking for you can do something like
v <- c(15,6,10,26,7,27,19,51,54,61,64,69,70)
if(sum(v>=1 & v<= 30) == 8 &
sum(v>=31 & v<= 60) %in% c(1L, 2L) &
sum(v>=61 & v<= 70) == 3){TRUE}
else{FALSE}
Thanks to #thelatemail for pointing out that the second condition should accept multiple values.

Distance Matrix from table in R

Good evening,
I need to solve a location problem in R and I'm stuck in one of the first steps.
From a .txt file I need to create a distance matrix using the euclidean method.
datos <- file.choose()
servidores <- read.table(datos)
servidores
From which I obtain the following information:
X50 shows the total number of servers.
x5 the number of hubs required.
x120 the total capacity.
The first column shows the distance of x.
The second column shows the distance of y.
The third column shows the requirements of the node.
X50 X5 X120
1 2 62 3
2 80 25 14
3 36 88 1
4 57 23 14
5 33 17 19
6 76 43 2
7 77 85 14
8 94 6 6
9 89 11 7
10 59 72 6
11 39 82 10
12 87 24 18
13 44 76 3
14 2 83 6
15 19 43 20
16 5 27 4
17 58 72 14
18 14 50 11
19 43 18 19
20 87 7 15
21 11 56 15
22 31 16 4
23 51 94 13
24 55 13 13
25 84 57 5
26 12 2 16
27 53 33 3
28 53 10 7
29 33 32 14
30 69 67 17
31 43 5 3
32 10 75 3
33 8 26 12
34 3 1 14
35 96 22 20
36 6 48 13
37 59 22 10
38 66 69 9
39 22 50 6
40 75 21 18
41 4 81 7
42 41 97 20
43 92 34 9
44 12 64 1
45 60 84 8
46 35 100 5
47 38 2 1
48 9 9 7
49 54 59 9
50 1 58 2
I tried to use the dist() function:
distance_matrix <-dist(servidores,method = "euclidean",diag = TRUE,upper = TRUE)
but since x and y are on different columns I am not sure what to do to get a 50x50 matrix with all the distances.
Anybody knows how could I create such matrix?.
Many thanks in advance.

Creating a sequence in R [duplicate]

This question already has answers here:
Create integer sequences defined by 'from' and 'to' vectors
(2 answers)
Closed 5 years ago.
Let's say, I created two vectors like:
Ncla = 10
CC.1 = seq(2,((Ncla *Ncla)-Ncla),(Ncla+1))
CC.2 = seq(Ncla,((Ncla *Ncla)-Ncla),(Ncla))
and, I tried to create the following sequence:
#[1] 2 3 4 5 6 7 8 9 10 13 14 15 16 17 18 19 20 24 25 26
# 27 28 29 30 35 36 37 38 39 40 46 47 48 49 50 57 58 59 60 68 69 70 79 80 90
using the statement:
for(i in 1:(Ncla-1)) A.1[i]={c(seq(CC.1[i],CC.2[i],length = 1))}
but it doesn't work.
Any help is greatly appreciated.
Try
unlist(Map(seq, CC.1, CC.2))
# [1] 2 3 4 5 6 7 8 9 10 13 14 15 16 17 18 19 20 24 25 26 27 28 29 30 35
#[26] 36 37 38 39 40 46 47 48 49 50 57 58 59 60 68 69 70 79 80 90
Or
unlist(sapply(seq_along(CC.1), function(i) seq(CC.1[i], CC.2[i])))
Or
A.1 <- list()
for(i in seq_along(CC.1)) A.1[[i]] <- seq(CC.1[i], CC.2[i])
unlist(A.1)
# [1] 2 3 4 5 6 7 8 9 10 13 14 15 16 17 18 19 20 24 25 26 27 28 29 30 35
#[26] 36 37 38 39 40 46 47 48 49 50 57 58 59 60 68 69 70 79 80 90
test<-NULL
for(i in 1:(Ncla-1)) {
A.1=c(seq(CC.1[i],CC.2[i],1))
test<-c(test,A.1)
}
test
Your mistake: You were not saving your results.

Resources