Good evening,
I need to solve a location problem in R and I'm stuck in one of the first steps.
From a .txt file I need to create a distance matrix using the euclidean method.
datos <- file.choose()
servidores <- read.table(datos)
servidores
From which I obtain the following information:
X50 shows the total number of servers.
x5 the number of hubs required.
x120 the total capacity.
The first column shows the distance of x.
The second column shows the distance of y.
The third column shows the requirements of the node.
X50 X5 X120
1 2 62 3
2 80 25 14
3 36 88 1
4 57 23 14
5 33 17 19
6 76 43 2
7 77 85 14
8 94 6 6
9 89 11 7
10 59 72 6
11 39 82 10
12 87 24 18
13 44 76 3
14 2 83 6
15 19 43 20
16 5 27 4
17 58 72 14
18 14 50 11
19 43 18 19
20 87 7 15
21 11 56 15
22 31 16 4
23 51 94 13
24 55 13 13
25 84 57 5
26 12 2 16
27 53 33 3
28 53 10 7
29 33 32 14
30 69 67 17
31 43 5 3
32 10 75 3
33 8 26 12
34 3 1 14
35 96 22 20
36 6 48 13
37 59 22 10
38 66 69 9
39 22 50 6
40 75 21 18
41 4 81 7
42 41 97 20
43 92 34 9
44 12 64 1
45 60 84 8
46 35 100 5
47 38 2 1
48 9 9 7
49 54 59 9
50 1 58 2
I tried to use the dist() function:
distance_matrix <-dist(servidores,method = "euclidean",diag = TRUE,upper = TRUE)
but since x and y are on different columns I am not sure what to do to get a 50x50 matrix with all the distances.
Anybody knows how could I create such matrix?.
Many thanks in advance.
Related
Let's say this is my dataframe:
df <- data.frame(replicate(10,sample(0:50,20,rep=TRUE)))
S X1 X2 X3 X4 X5 X6 X7 X8 X9 X10
1 3 26 39 25 24 4 46 42 8 42
2 40 6 50 50 22 2 40 24 26 17
3 32 45 18 7 19 6 33 12 0 13
4 3 45 43 32 16 33 25 18 35 45
5 7 36 2 25 16 20 24 14 27 29
6 45 4 12 13 50 35 38 1 27 34
7 18 43 38 16 34 18 19 45 4 34
8 18 9 33 38 18 13 23 44 41 4
9 28 34 6 3 14 11 47 4 21 50
10 6 48 42 46 48 42 12 33 1 32
11 28 20 37 2 26 33 5 2 22 27
12 40 30 41 45 28 6 5 46 21 46
13 1 47 46 37 0 3 11 45 12 11
14 20 0 9 38 42 15 44 1 2 45
15 49 29 25 41 38 26 20 34 50 0
16 2 5 47 6 36 34 28 36 32 38
17 15 22 50 13 26 9 37 40 41 23
18 44 27 47 37 26 34 31 36 44 12
19 47 41 19 2 50 44 48 36 34 38
20 25 31 28 34 8 19 3 13 14 23
I need to exclude subjects ('S') with values higher than 30 in 8 or more columns(X1:X10). That is, only exclude those who has 8 times or more values above 30 (e.g. Subject 19). I was thinking that maybe 'ifelse' function can be useful, but I really don't know how to implement it.
Any help is highly appreciated! Thanks a lot!
df[-which(apply(df, 1, function(x) sum(x > 30) > 8)),]
To illustrate how (and that) this works consider this dataframe:
set.seed(1111)
df <- data.frame(replicate(5,sample(0:50,5,rep=TRUE)))
df
X1 X2 X3 X4 X5
1 23 49 0 8 8
2 21 44 38 46 21
3 46 5 38 28 42
4 6 27 32 45 50
5 37 7 44 39 3
Here the second row has values > 20 in more than 4 rows. To remove that row you substract (-) from df those rows in which the number of columns where values are greater than 20 is greater than 4:
df[-which(apply(df, 1, function(x) sum(x > 20) > 4)),]
X1 X2 X3 X4 X5
1 23 49 0 8 8
3 46 5 38 28 42
4 6 27 32 45 50
5 37 7 44 39 3
Et voilá, the second rows has been removed.
You can try subset + rowSums like below
subset(df,!rowSums(df > 30)>=8)
I have a very large dataset (> 200000 lines) with 6 variables (only the first two shown)
>head(gt7)
ChromKey POS
1 2447 25
2 2447 183
3 26341 75
4 26341 2213
5 26341 2617
6 54011 1868
I have converted the Chromkey variable to a factor variable made up of > 55000 levels.
> gt7[1] <- lapply(gt7[1], factor)
> is.factor(gt7$ChromKey)
[1] TRUE
I can further make a table with counts of ChromKey levels
> table(gt7$ChromKey)
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
88 88 44 33 11 11 33 22 121 11 22 11 11 11 22 11 33
18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34
22 22 44 55 22 11 22 66 11 11 11 22 11 11 11 187 77
35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51
77 11 44 11 11 11 11 11 11 22 66 11 22 11 44 22 22
... outut cropped
Which I can save in table format
> table <- table(gt7$ChromKey)
> head(table)
1 2 3 4 5 6
88 88 44 33 11 11
I would like to know whether is it possible to have a table (and histogram) of the number of levels with specific count numbers. From the example above, I would expect
88 44 33 11
2 1 1 2
I would very much appreciate any hint.
We can apply table again on the output to get the frequency count of the frequency
table(table(gt7$ChromKey))
How can I translate this geometric law problem to numpy ?
Products produced by a machine has a 3% defective rate.
What is the probability that the first defective oc-curs in the fifth item inspected?
P(X= 5) =P(1st 4 non-defective )P( 5th defective)=(0.974)(0.03)
In R > dgeom (x= 4, prob = .03)[1] 0.02655878T
The convention in R is to record X as the number of failures that occur
before the first success.
Is this my numpy code ok ? :
result = np.random.geometric(p=0.03, size=1000)
print(result);
result = (result == 5).sum() / 1000.
print(result * 1000,"%");
I get 17 % as a result with numpy , is it ok ? Seem wrong because there is only 3% defect rate.
This is the numpy result Array :
""" [ 31 20 37 9 47 31 22 7 44 15 52 15 4 14 36 45 26 27
9 48 30 5 7 17 7 24 121 22 23 49 2 26 25 8 4 5
3 27 70 71 3 1 19 22 103 18 14 20 34 45 8 169 11 63
29 71 30 79 75 19 56 9 5 8 15 44 8 12 40 29 46 2
144 69 65 1 4 90 20 187 100 52 46 76 3 105 12 110 31 3
113 18 6 15 127 22 6 7 3 18 123 41 69 104 13 18 2 8
52 35 54 27 74 22 31 27 3 15 21 26 13 3 32 10 131 20
I guess that 31 is the number of integrity checks before a failure .... 20 , 37 etc ...
This is what I would do:
np.random.seed(1)
tests = np.random.choice([0,1], size=(1000,5), p=[0.7,0.3])
((np.argmax(tests, axis=1) == 4) & tests[:,4]==1).mean()
# 0.073
I have this dataframe and I want to plot with ggplot on x axis the result_df50$id column and on the y axis the columns result_df50$Sens and result_df50$Spec.
Also I want result_df50$Sens and result_df50$Spec to be displayed in different colors. The legend should also show the different colors of the columns.
> result_df50
Acc Sens Spec id
1 12 51 15 1
2 24 78 28 2
3 31 86 32 3
4 78 23 90 4
5 49 43 56 5
6 25 82 33 6
7 6 87 8 7
8 60 33 61 8
9 54 4 66 9
10 5 54 9 10
11 1 53 4 11
12 2 59 7 12
13 4 73 3 13
14 48 41 55 14
15 30 72 39 15
16 57 10 67 16
17 80 31 91 17
18 30 65 36 18
19 58 45 61 19
20 12 50 19 20
21 39 47 46 21
22 38 49 45 22
23 3 69 5 23
24 68 24 76 24
25 35 64 42 25
So far I tried this and I am happy with it.
ggplot(data = result_df50) +
geom_line(data= result_df50, aes(x = result_df50$id, y = result_df50$Spec), colour = "blue") +
geom_line(data= result_df50, aes(x = result_df50$id, y = result_df50$Sens), colour = "red") +
labs(x="Number of iterations")
Now I just want to add the legend with the colors of each line. I tried fill, but R gives a warning and ignores this unknown aesthetics: fill....
How can I do this?
This is because your dataset has the wrong format (wide). You'll have to convert it into long format to make it work as follows:
result_df50 <- read.table(text="Acc Sens Spec id
1 12 51 15 1
2 24 78 28 2
3 31 86 32 3
4 78 23 90 4
5 49 43 56 5
6 25 82 33 6
7 6 87 8 7
8 60 33 61 8
9 54 4 66 9
10 5 54 9 10
11 1 53 4 11
12 2 59 7 12
13 4 73 3 13
14 48 41 55 14
15 30 72 39 15
16 57 10 67 16
17 80 31 91 17
18 30 65 36 18
19 58 45 61 19
20 12 50 19 20
21 39 47 46 21
22 38 49 45 22
23 3 69 5 23
24 68 24 76 24
25 35 64 42 25")
# conversion to long format
library(reshape2)
result_df50 <- melt(result_df50, id.vars=c("Acc", "id"))
head(result_df50)
# Acc id variable value
# 1 12 1 Sens 51
# 2 24 2 Sens 78
# 3 31 3 Sens 86
# 4 78 4 Sens 23
# 5 49 5 Sens 43
# 6 25 6 Sens 82
# your plot
ggplot(data = result_df50, aes(x = id, y =value , color=variable)) +
geom_line() +
labs(x="Number of iterations")+
scale_color_manual(values=c("red", "blue")) # in case you want to keep your colors
Is this what you want?
This question already has answers here:
Create integer sequences defined by 'from' and 'to' vectors
(2 answers)
Closed 5 years ago.
Let's say, I created two vectors like:
Ncla = 10
CC.1 = seq(2,((Ncla *Ncla)-Ncla),(Ncla+1))
CC.2 = seq(Ncla,((Ncla *Ncla)-Ncla),(Ncla))
and, I tried to create the following sequence:
#[1] 2 3 4 5 6 7 8 9 10 13 14 15 16 17 18 19 20 24 25 26
# 27 28 29 30 35 36 37 38 39 40 46 47 48 49 50 57 58 59 60 68 69 70 79 80 90
using the statement:
for(i in 1:(Ncla-1)) A.1[i]={c(seq(CC.1[i],CC.2[i],length = 1))}
but it doesn't work.
Any help is greatly appreciated.
Try
unlist(Map(seq, CC.1, CC.2))
# [1] 2 3 4 5 6 7 8 9 10 13 14 15 16 17 18 19 20 24 25 26 27 28 29 30 35
#[26] 36 37 38 39 40 46 47 48 49 50 57 58 59 60 68 69 70 79 80 90
Or
unlist(sapply(seq_along(CC.1), function(i) seq(CC.1[i], CC.2[i])))
Or
A.1 <- list()
for(i in seq_along(CC.1)) A.1[[i]] <- seq(CC.1[i], CC.2[i])
unlist(A.1)
# [1] 2 3 4 5 6 7 8 9 10 13 14 15 16 17 18 19 20 24 25 26 27 28 29 30 35
#[26] 36 37 38 39 40 46 47 48 49 50 57 58 59 60 68 69 70 79 80 90
test<-NULL
for(i in 1:(Ncla-1)) {
A.1=c(seq(CC.1[i],CC.2[i],1))
test<-c(test,A.1)
}
test
Your mistake: You were not saving your results.