Extracting columns from data according to column names - r

I used list.file in order to read many files in a folder named "matrices"
data <- list.files(path="/home/rania/Downloads/matrices/", full.names=TRUE, pattern="\\.csv") %>% lapply(read.csv, header=TRUE, sep=",")
when i write data the result is below :
> data
[[1]]
X25 X14 X14.1
1 4 145 58
2 3 4 5
3 10 11 12
4 14 12 17
5 12 2 8
6 8 2 8
7 7 47 55
8 15 12 18
9 17 12 18
10 14 22 47
11 17 12 75
12 23 77 14
13 12 23 17
[[2]]
X58 X14 X87
1 69 14 20
2 6 8 95
3 13 14 15
4 10 11 14
5 4 1 1
6 7 6 7
7 66 75 8
8 19 13 17
9 89 55 23
10 45 58 58
11 12 32 74
12 12 74 25
13 23 12 28
[[3]]
X63 X21 X87
1 87 23 88
2 20 12 47
3 16 17 18
4 15 10 17
5 8 4 8
6 5 2 6
7 74 85 66
8 19 15 13
9 47 22 54
10 47 77 12
11 15 17 85
12 12 33 45
13 23 47 58
[[4]]
X10 X23 X15
1 11 23 23
2 12 87 17
3 19 12 11
4 19 66 12
5 2 12 77
6 7 88 12
7 45 95 32
8 17 21 78
9 19 12 11
10 12 13 77
11 21 47 13
12 74 12 25
13 1 52 7
I need to extract data from each one of those files, in one iteration, according to columns name. How can i do that ? thanks !

We can use Map
do.call(cbind, Map(`[`, data, c("X25", "X14", "X21", "X23")))
# X25 X14 X21 X23
#1 4 14 23 23
#2 3 8 12 87
#3 10 14 17 12
#4 14 11 10 66
#5 12 1 4 12
#6 8 6 2 88
#7 7 75 85 95
#8 15 13 15 21
#9 17 55 22 12
#10 14 58 77 13
#11 17 32 17 47
#12 23 74 33 12
#13 12 12 47 52

Thanks #RoyalTS it works
> do.call(cbind, Map(`[`, data, c("X25", "X14", "X21", "X23")))
X25 X14 X21 X23
1 4 14 23 23
2 3 8 12 87
3 10 14 17 12
4 14 11 10 66
5 12 1 4 12
6 8 6 2 88
7 7 75 85 95
8 15 13 15 21
9 17 55 22 12
10 14 58 77 13
11 17 32 17 47
12 23 74 33 12
13 12 12 47 52

Related

Generating a vector with n repetitions of x, then y, then z, with a fixed upper bound

I am trying to create a vector where I have 3 repetitions of the number 1, then 3 repetitions of the number 2, and so on up to, for instance, 3 repetitions of the number 36.
c(1,1,1,2,2,2,3,3,3,4,4,4,5,5,5...)
I have tried the following use of rep() but got the following error:
Error in rep(3, seq(1:36)) : argument 'times' incorrect
What formulation do I need to use to properly generate the vector I want?
sort(rep(1:36, 3))
Or even better as #Wimpel mentioned in the comments, use the each argument of the rep function.
rep(1:36, each = 3)
output
# [1] 1 1 1 2 2 2 3 3 3 4 4 4 5 5 5 6 6 6 7 7 7 8 8 8 9 9 9 10 10 10 11 11 11 12 12 12 13 13 13 14 14 14 15 15 15 16 16 16 17 17 17 18 18 18 19 19 19 20 20 20 21 21 21 22
# [65] 22 22 23 23 23 24 24 24 25 25 25 26 26 26 27 27 27 28 28 28 29 29 29 30 30 30 31 31 31 32 32 32 33 33 33 34 34 34 35 35 35 36 36 36
This one should work. However probably not the most elegant.
reps = c()
n = 36
for(i in 1:n){
reps = append(reps, rep(i, 3))
}
reps
alternatively using the rep function properly (see documentation (?rep for argument each):
rep(1:36,each = 3)
rep approach is preferable (see existing answers)
Here are some other options:
> kronecker(1:36, rep(1, 3))
[1] 1 1 1 2 2 2 3 3 3 4 4 4 5 5 5 6 6 6 7 7 7 8 8 8 9
[26] 9 9 10 10 10 11 11 11 12 12 12 13 13 13 14 14 14 15 15 15 16 16 16 17 17
[51] 17 18 18 18 19 19 19 20 20 20 21 21 21 22 22 22 23 23 23 24 24 24 25 25 25
[76] 26 26 26 27 27 27 28 28 28 29 29 29 30 30 30 31 31 31 32 32 32 33 33 33 34
[101] 34 34 35 35 35 36 36 36
> c(outer(rep(1, 3), 1:36))
[1] 1 1 1 2 2 2 3 3 3 4 4 4 5 5 5 6 6 6 7 7 7 8 8 8 9
[26] 9 9 10 10 10 11 11 11 12 12 12 13 13 13 14 14 14 15 15 15 16 16 16 17 17
[51] 17 18 18 18 19 19 19 20 20 20 21 21 21 22 22 22 23 23 23 24 24 24 25 25 25
[76] 26 26 26 27 27 27 28 28 28 29 29 29 30 30 30 31 31 31 32 32 32 33 33 33 34
[101] 34 34 35 35 35 36 36 36

Order Levels (format: numbers) in a factor from low to high in RStudio

I have a problem in R. I have created a factor (called reference). But the Levels are not in the right order; I want them to be in order from low to high (1,2,3...30). I tried reference<-relevel(reference,1) but then the order is not right. Is there a way how I can change the order as I want?
reference
[1] 5 5 1 5 5 5 1 1 1 1 1 11 1 1 1 5 1 5 1 2 1 1 1 1 2 1 1 1 3 1 2 1 2 15 2 2 2 15
[39] 16 3 2 2 4 2 16 23 2 14 2 4 2 3 2 14 4 24 2 2 2 2 3 4 3 3 3 3 25 3 2 3 3 3 3 3 25 3
[77] 3 3 3 1 3 15 3 3 3 3 3 1 1 3 8 4 4 4 4 8 4 4 4 4 4 4 4 4 4 4 8 4 4 4 4 8 4 4
[115] 4 4 15 8 4 16 8 16 14 14 5 5 5 5 7 5 16 5 14 16 14 14 5 5 5 5 14 5 3 5 7 8 4 7 5 5 6 4
[153] 4 15 15 15 6 4 6 14 4 14 15 6 4 11 6 28 16 6 16 15 9 14 6 14 15 6 16 14 7 14 16 16 16 16 7 7 14 16
[191] 16 7 15 7 4 15 7 15 14 15 15 9 14 7 16 15 15 15 16 14 8 8 9 4 8 8 10 8 4 7 8 4 8 4 8 8 8 8
[229] 8 8 9 8 8 4 8 8 14 8 8 8 29 14 29 9 29 14 9 16 29 10 29 14 16 9 9 29 29 29 9 29 16 4 4 9 15 29
[267] 9 23 29 9 10 4 10 10 10 10 10 10 10 10 10 10 14 5 10 10 15 11 10 11 10 10 11 11 11 15 4 10 15 10 10 11 11 10
[305] 10 10 11 11 10 11 11 10 11 11 11 10 11 10 8 11 10 10 11 11 10 10 11 11 10 11 14 16 7 15 12 14 14 15 15 14 14 14
[343] 12 17 11 2 15 16 7 16 15 15 15 14 17 28 5 7 17 16 11 13 13 11 13 13 13 13 16 13 13 13 11 13 15 13 13 13 11 13
[381] 13 13 10 13 13 13 13 13 13 10 11 14 15 14 4 14 14 15 8 14 4 4 14 14 14 15 15 14 4 4 14 4 4 4 14 7 8 14
[419] 11 11 15 15 16 3 5 11 15 14 15 15 4 3 15 4 15 15 15 14 14 15 16 15 14 15 11 15 15 15 15 16 7 4 16 16 16 14
[457] 15 14 16 16 16 16 15 12 4 4 4 14 16 16 15 15 14 26 16 26 14 16 4 13 17 21 17 21 17 17 17 17 17 20 17 21 17 17
[495] 18 17 18 17 18 21 17 21 20 18 21 18 18 20 17 17 17 20 17 17 18 18 18 20 18 18 21 18 21 17 17 17 18 18 21 18 17 18
[533] 17 21 17 20 18 22 18 20 19 18 18 19 4 19 18 19 19 15 1 19 17 7 3 20 17 19 19 19 20 18 19 19 19 19 20 18 15 14
[571] 21 20 20 20 20 22 20 20 19 20 20 20 20 20 22 20 21 18 20 20 21 21 17 18 21 20 20 18 20 20 21 17 21 21 22 21 20 20
[609] 21 21 21 21 17 21 18 21 17 18 17 20 20 18 20 20 18 21 21 21 20 17 21 22 22 22 22 22 22 22 21 22 22 22 21 22 22 21
[647] 22 22 21 29 21 22 22 22 22 22 22 20 18 22 8 15 4 4 15 4 4 15 15 15 4 4 4 15 4 15 15 23 4 23 4 2 8 23
[685] 4 23 10 2 4 7 4 7 18 24 15 15 26 11 15 15 4 24 7 15 15 5 24 15 4 1 4 8 24 23 23 6 4 3 23 4 5 1
[723] 1 1 1 2 1 1 1 1 1 1 1 1 3 2 1 5 1 2 2 1 1 1 20 2 3 25 2 1 3 15 15 15 14 14 14 5 14 15
[761] 4 18 14 8 26 4 15 20 10 16 8 4 15 15 16 4 23 18 15 4 27 27 27 27 27 27 27 27 27 27 27 27 27 27 27 27 27 27
[799] 27 27 27 27 27 27 27 27 27 27 27 27 28 28 28 28 28 28 28 28 28 28 28 28 28 28 28 28 28 28 28 28 28 28 28 28 28 28
[837] 28 28 28 28 29 29 29 29 29 29 29 29 29 29 29 29 29 29 29 29 29 29 29 29 29 29 29 29 29 29 29 29 29 29 30 30 30 30
[875] 30 30 30 30 30 30 30 30 30 30 30 30 30 30 30 30 30 30 30 30 30 30 30 30 30 30
Levels: 26 28 27 29 3 30 4 5 6 7 8 9 1 10 11 12 13 14 15 16 17 18 19 2 20 21 22 23 24 25
You can convert the factors to numbers and then factors again :
reference <- factor(as.numeric(as.character(reference)))
Or if you already know the range of factors :
reference <- factor(reference, 1:30)

Remove rows by multiple logical conditions (rstudio)

Let's say this is my dataframe:
df <- data.frame(replicate(10,sample(0:50,20,rep=TRUE)))
S X1 X2 X3 X4 X5 X6 X7 X8 X9 X10
1 3 26 39 25 24 4 46 42 8 42
2 40 6 50 50 22 2 40 24 26 17
3 32 45 18 7 19 6 33 12 0 13
4 3 45 43 32 16 33 25 18 35 45
5 7 36 2 25 16 20 24 14 27 29
6 45 4 12 13 50 35 38 1 27 34
7 18 43 38 16 34 18 19 45 4 34
8 18 9 33 38 18 13 23 44 41 4
9 28 34 6 3 14 11 47 4 21 50
10 6 48 42 46 48 42 12 33 1 32
11 28 20 37 2 26 33 5 2 22 27
12 40 30 41 45 28 6 5 46 21 46
13 1 47 46 37 0 3 11 45 12 11
14 20 0 9 38 42 15 44 1 2 45
15 49 29 25 41 38 26 20 34 50 0
16 2 5 47 6 36 34 28 36 32 38
17 15 22 50 13 26 9 37 40 41 23
18 44 27 47 37 26 34 31 36 44 12
19 47 41 19 2 50 44 48 36 34 38
20 25 31 28 34 8 19 3 13 14 23
I need to exclude subjects ('S') with values higher than 30 in 8 or more columns(X1:X10). That is, only exclude those who has 8 times or more values above 30 (e.g. Subject 19). I was thinking that maybe 'ifelse' function can be useful, but I really don't know how to implement it.
Any help is highly appreciated! Thanks a lot!
df[-which(apply(df, 1, function(x) sum(x > 30) > 8)),]
To illustrate how (and that) this works consider this dataframe:
set.seed(1111)
df <- data.frame(replicate(5,sample(0:50,5,rep=TRUE)))
df
X1 X2 X3 X4 X5
1 23 49 0 8 8
2 21 44 38 46 21
3 46 5 38 28 42
4 6 27 32 45 50
5 37 7 44 39 3
Here the second row has values > 20 in more than 4 rows. To remove that row you substract (-) from df those rows in which the number of columns where values are greater than 20 is greater than 4:
df[-which(apply(df, 1, function(x) sum(x > 20) > 4)),]
X1 X2 X3 X4 X5
1 23 49 0 8 8
3 46 5 38 28 42
4 6 27 32 45 50
5 37 7 44 39 3
Et voilá, the second rows has been removed.
You can try subset + rowSums like below
subset(df,!rowSums(df > 30)>=8)

Distance Matrix from table in R

Good evening,
I need to solve a location problem in R and I'm stuck in one of the first steps.
From a .txt file I need to create a distance matrix using the euclidean method.
datos <- file.choose()
servidores <- read.table(datos)
servidores
From which I obtain the following information:
X50 shows the total number of servers.
x5 the number of hubs required.
x120 the total capacity.
The first column shows the distance of x.
The second column shows the distance of y.
The third column shows the requirements of the node.
X50 X5 X120
1 2 62 3
2 80 25 14
3 36 88 1
4 57 23 14
5 33 17 19
6 76 43 2
7 77 85 14
8 94 6 6
9 89 11 7
10 59 72 6
11 39 82 10
12 87 24 18
13 44 76 3
14 2 83 6
15 19 43 20
16 5 27 4
17 58 72 14
18 14 50 11
19 43 18 19
20 87 7 15
21 11 56 15
22 31 16 4
23 51 94 13
24 55 13 13
25 84 57 5
26 12 2 16
27 53 33 3
28 53 10 7
29 33 32 14
30 69 67 17
31 43 5 3
32 10 75 3
33 8 26 12
34 3 1 14
35 96 22 20
36 6 48 13
37 59 22 10
38 66 69 9
39 22 50 6
40 75 21 18
41 4 81 7
42 41 97 20
43 92 34 9
44 12 64 1
45 60 84 8
46 35 100 5
47 38 2 1
48 9 9 7
49 54 59 9
50 1 58 2
I tried to use the dist() function:
distance_matrix <-dist(servidores,method = "euclidean",diag = TRUE,upper = TRUE)
but since x and y are on different columns I am not sure what to do to get a 50x50 matrix with all the distances.
Anybody knows how could I create such matrix?.
Many thanks in advance.

split a list and increment for loop by 10

How to split a list in r?
I want to split a list in increment manner.
for ex.:
x <- 1:50
n <- 5
spt <- split(x,cut(x,quantile(x,(0:n)/n), include.lowest=TRUE, labels=FALSE))
we get
$`1`
[1] 1 2 3 4 5 6 7 8 9 10
$`2`
[1] 11 12 13 14 15 16 17 18 19 20
$`3`
[1] 21 22 23 24 25 26 27 28 29 30
$`4`
[1] 31 32 33 34 35 36 37 38 39 40
$`5`
[1] 41 42 43 44 45 46 47 48 49 50
I don't want this output. I want the output like below,
$`1`
[1] 1 2 3 4 5 6 7 8 9 10
$`2`
[1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
$`3`
[1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30
$`4`
[1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 2021 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40
$`5`
[1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40
41 42 43 44 45 46 47 48 49 50
any idea?
And i also want to know that how to increment for loop by 10 in r?
Thanks.
We can use seq
lapply(seq(10,50, by=10), function(i) x[1:i])
Or as #RichardScriven mentioned in the comments, the seq(10,50, by=10) can be replaced by 1:5 * 10L

Resources